**Sriram Sankaranarayanan Natasha Sharygina (Eds.)**

# **Tools and Algorithms for the Construction and Analysis of Systems**

**29th International Conference, TACAS 2023 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022 Paris, France, April 22–27, 2023 Proceedings, Part II**

## Lecture Notes in Computer Science 13994

Founding Editors

Gerhard Goos, Germany Juris Hartmanis, USA

### Editorial Board Members

Elisa Bertino, USA Wen Gao, China

Bernhard Steffen , Germany Moti Yung , USA

## Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at https://link.springer.com/bookseries/558

Sriram Sankaranarayanan • Natasha Sharygina Editors

# Tools and Algorithms for the Construction and Analysis of Systems

29th International Conference, TACAS 2023 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022 Paris, France, April 22–27, 2023 Proceedings, Part II

Editors Sriram Sankaranarayanan University of Colorado Boulder, CO, USA

Natasha Sharygina University of Lugano Lugano, Switzerland

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-031-30819-2 ISBN 978-3-031-30820-8 (eBook) https://doi.org/10.1007/978-3-031-30820-8

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## ETAPS Foreword

Welcome to the 26th ETAPS! ETAPS 2023 took place in Paris, the beautiful capital of France. ETAPS 2023 was the 26th instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming languages, analysis tools, and formal approaches to software engineering. Organising these conferences in a coherent, highly synchronized conference programme enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops took place that attracted many researchers from all over the globe.

ETAPS 2023 received 361 submissions in total, 124 of which were accepted, yielding an overall acceptance rate of 34.3%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2023 featured the unifying invited speakers Véronique Cortier (CNRS, LORIA laboratory, France) and Thomas A. Henzinger (Institute of Science and Technology, Austria) and the conference-specific invited speakers Mooly Sagiv (Tel Aviv University, Israel) for ESOP and Sven Apel (Saarland University, Germany) for FASE. Invited tutorials were provided by Ana-Lucia Varbanescu (University of Twente and University of Amsterdam, The Netherlands) on heterogeneous computing and Joost-Pieter Katoen (RWTH Aachen, Germany and University of Twente, The Netherlands) on probabilistic programming.

As part of the programme we had the second edition of TOOLympics, an event to celebrate the achievements of the various competitions or comparative evaluations in the field of ETAPS.

ETAPS 2023 was organized jointly by Sorbonne Université and Université Sorbonne Paris Nord. Sorbonne Université (SU) is a multidisciplinary, research-intensive and worldclass academic institution. It was created in 2018 as the merge of two first-class research-intensive universities, UPMC (Université Pierre and Marie Curie) and Paris-Sorbonne. SU has three faculties: humanities, medicine, and 55,600 students (4,700 PhD students; 10,200 international students), 6,400 teachers, professor-researchers and 3,600 administrative and technical staff members. Université Sorbonne Paris Nord is one of the thirteen universities that succeeded the University of Paris in 1968. It is a major teaching and research center located in the north of Paris. It has five campuses, spread over the two departments of Seine-Saint-Denis and Val d'Oise: Villetaneuse, Bobigny, Saint-Denis, the Plaine Saint-Denis and Argenteuil. The university has more than 25,000 students in different fields, such as health, medicine, languages, humanities, and science. The local organization team consisted of Fabrice Kordon (general co-chair), Laure Petrucci (general co-chair), Benedikt Bollig (workshops), Stefan Haar (workshops), Étienne André (proceedings and tutorials), Céline Ghibaudo (sponsoring), Denis Poitrenaud (web), Stefan Schwoon (web), Benoît Barbot (publicity), Nathalie Sznajder (publicity), Anne-Marie Reytier (communication), Hélène Pétridis (finance) and Véronique Criart (finance).

ETAPS 2023 is further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), EASST (European Association of Software Science and Technology), Lip6 (Laboratoire d'Informatique de Paris 6), LIPN (Laboratoire d'informatique de Paris Nord), Sorbonne Université, Université Sorbonne Paris Nord, CNRS (Centre national de la recherche scientifique), CEA (Commissariat à l'énergie atomique et aux énergies alternatives), LMF (Laboratoire méthodes formelles), and Inria (Institut national de recherche en informatique et en automatique).

The ETAPS Steering Committee consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (Twente, chair), Jan Kofroň (Prague), Barbara König (Duisburg), Thomas Noll (Aachen), Caterina Urban (Inria), Jan Křetínský (Munich), and Lenore Zuck (Chicago).

Other members of the steering committee are: Dirk Beyer (Munich), Luís Caires (Lisboa), Ana Cavalcanti (York), Bernd Finkbeiner (Saarland), Reiko Heckel (Leicester), Joost-Pieter Katoen (Aachen and Twente), Naoki Kobayashi (Tokyo), Fabrice Kordon (Paris), Laura Kovács (Vienna), Orna Kupferman (Jerusalem), Leen Lambers (Cottbus), Tiziana Margaria (Limerick), Andrzej Murawski (Oxford), Laure Petrucci (Paris), Elizabeth Polgreen (Edinburgh), Peter Ryan (Luxembourg), Sriram Sankaranarayanan (Boulder), Don Sannella (Edinburgh), Natasha Sharygina (Lugano), Pawel Sobocinski (Tallinn), Sebastián Uchitel (London and Buenos Aires), Andrzej Wasowski (Copenhagen), Stephanie Weirich (Pennsylvania), Thomas Wies (New York), Anton Wijs (Eindhoven), and James Worrell (Oxford).

I would like to take this opportunity to thank all authors, keynote speakers, attendees, organizers of the satellite workshops, and Springer-Verlag GmbH for their support. I hope you all enjoyed ETAPS 2023.

Finally, a big thanks to Laure and Fabrice and their local organization team for all their enormous efforts to make ETAPS a fantastic event.

April 2023 Marieke Huisman ETAPS SC Chair ETAPS e.V. President

### Preface

We are pleased to present the proceedings of TACAS 2023, the 29th edition of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems held as part of the 26th European Joint Conferences on Theory and Practice of Software (ETAPS 2023), April 24–28, 2023 in Paris, France. TACAS brings together a community of researchers, developers, and end-users who are broadly interested in rigorous algorithmic techniques for the construction and analysis of systems. The conference is a venue that interleaves various disciplines including formal verification of software and hardware systems, static analysis, program synthesis, verification of machine learning/autonomous systems, probabilistic programming, SAT/SMT solving, constraint solving, static analysis, automated theorem proving and Cyber-Physical Systems.

There were five submission categories for TACAS 2023:


Regular research, case study, and regular tool papers were restricted to a total of sixteen pages, and tool demonstration papers to six pages, exclusive of references.

This year 169 papers were submitted to TACAS, consisting of 119 regular research papers, 34 regular tool and case study papers, and 16 tool demonstration papers. Each paper was reviewed by three Program Committee (PC) members, who made use of subreviewers. As a result, the PC accepted in total 62 papers, among which there were 45 regular papers, 11 regular tool/case-study papers and 6 tool demonstration papers. The PC members were pleasantly surprised by an unusually large number of strong submissions. Almost all accepted papers had either all positive reviews or a "championing" program committee member who argued in favor of accepting the paper. Furthermore, all accepted papers had a positive average score. One paper was accepted conditionally and successfully "shepherded" by the PC.

Similarly to previous years, it was possible to submit an artifact alongside a paper, which was mandatory for regular tool and tool demonstration papers. An artifact might consist of tools, models, proofs, or other data required for validation of the results of the paper. The Artifact Evaluation Committee (AEC) reviewed the artifacts based on their documentation, ease of use, and, most importantly, whether the results presented in the corresponding paper could be accurately reproduced. The evaluation was carried out using a standardized virtual machine to ensure consistency of the results, except for 4 artifacts that had special hardware or software requirements. The evaluation had two rounds. The first round was carried out in parallel with the work of the PC and evaluated the artifacts for all the submitted regular tool and tool demo papers. The judgment of the AEC was communicated to the PC and weighed in their discussion (the PC rejected a total of 4 papers in this phase). The second round took place after the paper acceptance notifications were sent out so the authors of accepted research and case-study papers could submit their artifacts. In both rounds, the AEC provided 3 reviews per artifact and communicated with the authors to resolve apparent technical issues. In total, 69 artifacts were submitted (51 in the first round and 18 in the second), and the AEC evaluated a total of 64 artifacts regarding their availability, functionality, and/or reusability. Finally, among the 62 accepted papers, the AEC awarded 32 functional badges, 21 reusable badges, and 33 available badges. Such badges appear on the first page of each paper to certify the properties of each artifact.

As a separate conference track, TACAS 2023 hosted the 12th Competition on Software Verification (SV-COMP 2023). SV-COMP is the annual comparative evaluation of tools for automatic software verification and witness validation. The TACAS proceedings contain a selection of 13 short papers that describe participating verification systems and a report presenting the results of the competition. These papers were reviewed by a separate program committee (the competition jury); each of the papers was assessed by at least three reviewers. A total of 52 verification systems were systematically evaluated, with 34 developer teams from ten countries, including five submissions from industry. Two sessions in the TACAS program were reserved for the competition: presentations by the competition chair and the participating development teams in the first session and an open community meeting in the second session.

We would like to thank all the people who helped to make TACAS 2023 successful. First, we would like to thank the authors for submitting their papers to TACAS 2023. The PC members and additional reviewers did a great job in reviewing papers: they contributed informed and detailed reports and engaged in the PC discussions. We also thank the steering committee, and especially its chair, Joost-Pieter Katoen, for his valuable advice. Lastly, we would like to thank the overall organization team of ETAPS 2023.

April 2023 Sriram Sankaranarayanan Natasha Sharygina Grigory Fedyukovich Sergio Mover Dirk Beyer

### Organization

### Program Committee Chairs


### Program Committee

Ezio Bartocci TU Wien, Austria Armin Biere Freiburg, Germany Nikolaj Bjørner Microsoft, USA Chuchu Fan MIT, USA Khalil Ghorbal Inria, France Laura Kovacs TU Wien, Austria

Christel Baier TU Dresden, Germany Haniel Barbosa Universidade Federal de Minas Gerais, Brazil Dirk Beyer LMU Munich, Germany Roderick Bloem Graz University of Technology, Austria Ahmed Bouajjani IRIF, Université Paris Cité, France Hana Chockler King's College London, UK Alessandro Cimatti Fondazione Bruno Kessler, Italy Rance Cleaveland University of Maryland, USA Javier Esparza TU Munich, Germany Grigory Fedyukovich Florida State University, USA Bernd Finkbeiner CISPA Helmholtz Center for Information Security, Germany Martin Fränzle Carl von Ossietzky Universität Oldenburg, Germany Laure Gonnord Grenoble-INP/LCIS, France Orna Grumberg Technion - Israel Institute of Technology, Israel Kim Guldstrand Larsen Aalborg University, Denmark Arie Gurfinkel University of Waterloo, Canada Ranjit Jhala University of California, San Diego, USA Alexander Kulikov St. Petersburg Department of Steklov Institute of Mathematics, Russia Bettina Könighofer Graz University of Technology, Austria Wenchao Li Boston University, USA Sergio Mover Ecole Polytechnique, France Peter Müller ETH Zurich, Switzerland Kedar Namjoshi Nokia Bell Labs, USA Aina Niemetz Stanford University, USA Corina Pasareanu CMU, NASA, KBR, USA Nir Piterman University of Gothenburg, Sweden


### Artifact Evaluation Committee Chairs


### Artifact Evaluation Committee



### Program Committee and Jury—SV-COMP

Dirk Beyer (Chair) LMU Munich, Germany Viktor Malík (2LS) TU Brno, Czechia Lei Bu (BRICK) Nanjing University, China Marek Chalupa (Bubaak) ISTA, Austria Michael Tautschnig (CBMC) Queen Mary University London, UK Henrik Wachowitz (CPAchecker) LMU Munich, Germany Hernán Ponce de León (Dartagnan) Huawei Dresden Research, Germany Fei He (Deagle) Tsinghua University, China Fatimah Aljaafari (EBF) University of Manchester, UK Rafael Sá Menezes (ESBMC-kind) University of Manchester, UK Martin Spiessl (Frama-C-SV) LMU Munich, Germany Falk Howar (GDart, GDart-LLVM) TU Dortmund, Germany Simmo Saan (Goblint) University of Tartu, Estonia William Leeson (Graves-CPA, Graves-Par) University of Virginia, USA Soha Hussein (Java-Ranger) University of Minnesota, USA Peter Schrammel (JBMC) University of Sussex/Diffblue, UK Gidon Ernst (Korn) LMU Munich, Germany Tong Wu (LF-checker) University of Manchester, UK Vesal Vojdani (Locksmith) University of Tartu, Estonia Lei Bu (MLB) Nanjing University, China Raphaël Monat (Mopsa) Inria and University of Lille, France Cedric Richter (PeSCo-CPA) University of Oldenburg, Germany Jie Su (PIChecker) Xidian University, China Marek Trtik (Symbiotic) Masaryk University, Brno, Czechia Levente Bajczi (Theta) Budapest University of Technology and Economics, Hungary


## Steering Committee

Dirk Beyer LMU Munich, Germany

### Additional Reviewers

Abd Alrahman, Yehia Ahmad, H. M. Sabbir An, Jie Asarin, Eugene Azzopardi, Shaun Bacci, Giorgio Baier, Daniel Balakrishnan, Gogul Balasubramanian, A. R. Baumeister, Jan Becchi, Anna Ben Shimon, Yoav Berger, Guillaume Beutner, Raven Bily, Aurel Blicha, Martin Bombardelli, Alberto Brieger, Marvin Brizzio, Matías Bunk, Thomas Caillaud, Benoît Cano Córdoba, Filip

Rance Cleaveland University of Maryland, USA Holger Hermanns Universität des Saarlandes, Germany Joost-Pieter Katoen (Chair) RWTH Aachen, Germany and Universiteit Twente, Netherlands Kim G. Larsen Aalborg University, Denmark Bernhard Steffen Technische Universität Dortmund, Germany

> Ceresa, Martin Ceska, Milan Chen, Mingshuai Chen, Xin Chen, Yilei Chiari, Michele Czerner, Philipp Dardinier, Thibault Dawson, Charles De Masellis, Riccardo Debrestian, Darin Di Stefano, Luca Egolf, Derek Elad, Neta Elashkin, Andrey Esen, Zafer Fazekas, Katalin Feng, Shenghua Ferres, Bruno Fiedor, Jan Fleury, Mathias Fontaine, Pascal

Frenkel, Eden Frenkel, Hadar Froleyks, Nils Fu, Feisi Garcia-Contreras, Isabel Garg, Kunal Georgiou, Pamina Gianola, Alessandro Gigerl, Barbara Goorden, Martijn Gorostiaga, Felipe Goyal, Srajan Griggio, Alberto Grosen, Thomas Møller Gstrein, Bernhard Gupta, Ashutosh Habermehl, Peter Hader, Thomas Hadzic, Vedad Hagemann, Willem Hamza, Ameer Haring, Johannes Hausmann, Daniel Havlena, Vojtěch Hermo, Montserrat Holík, Lukáš Hozzová, Petra Huang, Chao Huang, Chengchao Hyvärinen, Antti Itzhaky, Shachar Jacobs, Swen Jaeger, Manfred Jansen, David N. Jensen, Nicolaj Østerby Jha, Prabhat Jonas, Martin Junges, Sebastian Kaki, Gowtham Kaufmann, Daniela Kenison, George Kettl, Matthias Khalimov, Ayrat Kifetew, Fitsum Kiourti, Panagiota Klüppelholz, Sascha

Kröger, Paul Käfer, Nikolai Lal, Akash Larrauri, Alberto Larraz, Daniel Lazic, Marijana Le, Nham Lee, Nian-Ze Lengal, Ondrej Li, Renjue Lidell, David Liu, Jiaxiang Lopez-Miguel, Ignacio D. Luttenberger, Michael Macías, Fernando Maderbacher, Benedikt McClurg, Jedidiah Meng, Yue Metzger, Niklas Michelland, Sebastien Monniaux, David Moosbrugger, Marcel Nadel, Alexander Nam, Seunghyeon Nesterini, Eleonora Neufeld, Emery Nickovic, Dejan Noetzli, Andres Oliveira Da Costa, Ana Otoni, Rodrigo Parthasarathy, Gaurav Paxian, Tobias Pluska, Alexander Poli, Federico Pontiggia, Francesco Prandi, Davide Pranger, Stefan Preiner, Mathias Radanne, Gabriel Rakow, Astrid Rappoport, Omer Rauh, Andreas Rawson, Michael Rebola Pardo, Adrian Reynolds, Andrew Riley, Daniel

Rodriguez, Andoni Rogalewicz, Adam Román Calvo, Enrique Rubio, Rubén Rutledge, Kwesi Sallinger, Sarah Sankaranarayanan, Sriram Schlichtkrull, Anders Schoisswohl, Johannes Schultz, William Schupp, Stefan Schwammberger, Maike Sextl, Florian Siber, Julian So, Oswin Sogokon, Andrew Spiessl, Martin Steen, Alexander Su, Yusen Susi, Angelo Síč, Juraj Tappler, Martin Thibault, Joan Ting, Gan Treml, Lilly Maria Trivedi, Ashutosh

Turrini, Andrea Varanasi, Sarat Chandra Vediramana Krishnan, Hari Govind Visconti, Ennio Wachowitz, Henrik Wand, Michael Wardega, Kacper Weininger, Maximilian Wendler, Philipp Wienhöft, Patrick Wu, Hao Wu, Haoze Xue, Anton Yadav, Drishti Yang, Pengfei Yang, Ruixiao Yu, Chenning Yu, Mingxin Zavalia, Lucas Zhan, Bohua Zhang, Hanwei Zhang, Songyuan Zhou, Weichao Zhou, Yuhao Zimmermann, Martin Zlatkin, Ilia

### Contents – Part II

#### Tool Demos




#### Tools (Regular Papers)



#### Graphs/Probabilistic Systems


#### Runtime Monitoring/Program Analysis


#### 12th Competition on Software Verification — SV-COMP 2023




## Contents – Part I

#### Invited Talk


#### Machine Learning/Neural Networks



### Constraint Solving/Blockchain


#### Markov Chains/Stochastic Control


#### Verification


xxiv Contents – Part I


## **Tool Demos**

## EVA: a Tool for the Compositional Verification of AUTOSAR Models

Alessandro Cimatti<sup>1</sup> , Luca Cristoforetti<sup>1</sup> , Alberto Griggio<sup>1</sup> , Stefano Tonetta<sup>1</sup> , Sara Corfini2() , Marco Di Natale2,<sup>4</sup> , and Florian Barrau<sup>3</sup>

> Fondazione Bruno Kessler, Trento, Italy Huawei Pisa Research Center, Pisa, Italy s.corfini@huawei.com Huawei Grenoble Research Center, Grenoble, France Scuola Superiore Sant'Anna, Pisa, Italy

Abstract. We present EVA, a framework for the integration of modern verification tools in the context of AUTOSAR, a widely-used open standard for the development of automotive software systems. Our framework enables the automatic end-to-end verification of system-level properties using a compositional approach. It combines software model checking techniques for the verification of software components at the code level with a contract-based analysis for verifying their correct composition. In this paper, we present the tool through its application on a representative automotive case study, discussing the main functionalities provided and the results obtained.

### 1 Introduction

AUTOSAR [1] is a worldwide consortium of car manufacturers and component or service providers in the automotive domain, with the main goal of providing a standardized software architecture for the development and execution of software components. One of the fundamental challenges in designing software for the AUTOSAR platform is ensuring safety. To this end, the application of formal methods – and in particular automatic (or semi-automatic) techniques based on model checking and theorem proving – is receiving significant interest as a complement to more traditional V&V techniques. In this paper we present EVA, a framework for the integration of modern verification tools in the context of AUTOSAR. EVA adopts a model-based compositional verification that founds on the contract-based methodology in [8]. The tool allows the automatic end-to-end verification of system-level properties, and combines software model checking techniques for the verification of software components at the code level with a contract-based analysis for verifying their correct composition. EVA also implements all the features that are required for usability in a typical industrial context, including a front-end integrated in a standard AUTOSAR development environment [2] with a user-friendly (formal) property editor, the automatic generation of code stubs and other views and forms to help the user manage verification in an AUTOSAR environment.

c The Author(s) 2023

Fig. 1. BrakeCommand and CruiseControl components.

We present EVA through its application on a representative case study, which describes a simplified active safety automotive system containing some of the typical safety functions available in the modern vehicles (such as lane departure warning, cruise control and a fault-tolerant brake pedal system). The example is meant to show the potential of the tool as a driver for a more widespread adoption of formal methods and contract-based verification in the industrial automotive context. Specifically, we introduce the case study in §2 and we describe the typical verification workflow followed by a user of EVA in §3. Finally, in §4 we discuss the main verification results obtained.

### 2 A Case Study for Verification in AUTOSAR

AUTOSAR defines the reference architecture for the development of automotive systems and provides the language (meta-model) for describing their architectural models. An AUTOSAR application consists of a hierarchy of components connected through ports. Provide ports represent output ports and require ports correspond to the input ports. Connectors represent data flow from one port to another. An AUTOSAR port can be classified as sender-receiver or client-server and sender-receiver communications can be queued or non-queued (i.e., with no buffering and the receiver always accesses the last sent data). In this paper we assume that all ports are sender-receiver and non-queued.

An atomic software component consists of a set of runnables. A runnable is a sequence of operations started by the Run-Time Environment (RTE). The runnable is configured so that is triggered by an event that can be timing, data sent or received, operation invoked, return of a server call, mode switching or external events. A special init event is used for runnables that are executed when the RTE starts and initializes the software components.

We illustrate the basic notions above by means of a simple but representative case study, that we shall use to present the main features of EVA. Figure 1 overviews (a section of) the architecture of the sample application. It collects 22 atomic components (including sensors, controllers and actuators) plus one composite component (AUTOSL) that represents the whole system, and implements some of the typical safety functions available in the modern vehicles such as autonomous emergency braking, lane departure warning, crash preparation and cruise control. We implemented (the runnables of) 9 components, 7 have been coded manually and 2 have been generated from a Simulink model using the Embedded Coder Support Package for AUTOSAR. The other components are considered as stubs because their data come from lower levels (hardware sensors) and we assume that the values they provide are correct.

The case study considers various safety properties, both at the level of the whole system and at the level of the implementation of individual components or runnables. As an example, we describe here two properties, a system-level one and a component-level one, both concerning the behaviour of the cruise controller. Specifically, the cruise controller is expected to react to a brake input by disengaging itself within two execution steps. At the implementation level, the requirement relates the input and output ports of the CruiseControl periodic runnable, stating that whenever the CruiseControl CCActive port is true and the Brake input port is true, then the CCActive output port must become false in at most two steps. At the system level, instead, the same requirement relates the behaviour of the components BrakeCommand and CruiseControl, stating that the cruise control shall be disengaged if the user brakes, even when one of the two brake pedal sensors is faulty.

#### 3 EVA Verification Workflow

EVA integrates the verification engines Kratos2 [6] and OCRA [5] into an analysis AUTOSAR toolchain. The ultimate goal is to automate the verification of formal properties (contracts) on AUTOSAR models. In its default configuration, EVA uses a portfolio of different state-of-the-art SAT- and SMT-based symbolic model checking algorithms (implemented in Kratos2 and OCRA) which include different variants of bit-level IC3 [10,12], IC3 with implicit abstraction [7], bounded model checking [3] and K-induction [11].

The typical workflow of the tool is sketched in Figure 2. At the beginning, the user creates an analysis project providing as input the AUTOSAR configuration of the system. The tool transforms the AUTOSAR configuration into an internal set of analysis models. Since the AUTOSAR standard deals neither with requirements nor with formal properties and their verification, EVA adopts the extended AUTOSAR metamodel defined in [4] to support such concepts.

The user then completes the configuration of the system and provides:


Fig. 2. The analysis workflow.

requirements for the case study of §2:

If the user brakes, the cruise control shall disengage within 2 steps (1) The signals of the brake pedal sensors shall be merged (2) Even if at most one brake pedal sensor is faulty (3) if the user brakes, the cruise control shall disengage

(1) and (2) are component-level requirements assigned to CruiseControl and BrakeCommand respectively, while (3) is a system-level requirement assigned to the composite AUTOSL and refined by (1) and (2).

contracts: the user formalizes the requirements into contracts. Precisely, a contract consists of (optional) assumptions (properties that shall be satisfied by the environment) and assertions (properties that the owner of the contract shall satisfy), expressed as formulas in Linear Temporal Logic (LTL) with some metric extensions (interpreted over discrete time). The user can assign

6

a contract either to a runnable or to a (composite) component.

```
in the future within [2,2] (4)
it shall always be that
   (CCActive and Brake is greater than 0) implies
      in the future within [0,2] (not next(CCActive))
holds true
```
Contract (4) is the formal representation of requirement (1) and it is assigned to the periodic runnable of the CruiseControl component<sup>5</sup> . It is worth noting that EVA provides a smart contract editor that assists the user with context completion, syntax highlighting and error detection. Also, to aid readability of contracts, EVA uses some syntactic sugar to represent temporal operators, such as in the future for F or it shall always be for G.

The user can create a new functional verification analysis, allowing to perform:


The result of both analyses can be that the contract is verified or violated. In case of contract violation, EVA returns a counterexample (and the corresponding test case, if the performed analysis is code verification). The user can fix the code or change the system configuration (refine requirements or scheduling runnables) and then execute the analysis again. The user can optionally apply local changes to the shared analysis models (typically after a contract has been verified).

In addition to the main features above, two further analyses are provided:


#### 4 Experimental Evaluation

In order to evaluate the effectiveness and performance of EVA, we applied it to the verification of all the 43 requirements (10 system-level, 33 component-level)

<sup>5</sup> We omit the contracts derived from (2) and (3) for lack of space (their formalization shall be included in the artifact accompanying this submission).

of the case-study application described in §2. Due to lack of space, we cannot report the results in detail and we shall limit our analysis to some qualitative considerations about the overall performance of EVA and the usefulness of the produced outputs. Full details on the obtained results will be included in the submitted artifact.

Performance considerations. We verified all the requirements on a PC running Ubuntu Linux 20.04, with a 2.6 GHz Intel Core I7-66000U CPU and 20 Gb of RAM. EVA was able to successfully perform 42 out of 43 verification tasks within the timeout (set to 1 hour), requiring less than one second in nearly half of the cases for component-level properties, and requiring less than one minute for all the remaining component-level tasks except one. For such problems, the main bottlenecks identified during the case study involved the use of complex floating-point operations, which are still handled inefficiently by the verification backend. Also the verification of the 10 system-level properties could be completed relatively efficiently, with EVA requiring less than one minute in 7 cases, and approximately 30 minutes for the hardest one. In this case, the main factor affecting performance (besides the expected ones such as the number of involved contracts and their complexity and length) are the constraints on the composition of components defined in the input model. In particular, performance is affected significantly in cases in which the contract under analysis involves periodic components with very different activation periods. The presence of periods that range from few milliseconds to seconds poses a conceptual/theoretical challenge because the reasoner must explore a large number of small steps of the more frequent tasks for each step of the slow ones. Optimizations targeting this issue are left as research directions for future works.

Issues discovered. During verification, several counterexamples have been discovered. Most of them turned out to be due to incorrect formalizations of requirements or missing environment assumptions, which could be easily fixed by examining the produced counterexamples. The analyses however revealed also a number of real bugs in the implementations of some of the software components as well as two issues due to wrong scheduling of components. The first was caused by a mismatch between the Simulink description of the CruiseControl periodic runnable and its C implementation in the AUTOSAR application. Specifically, the mismatch was due to different assumptions about the rate of execution of the step of the cruise control with respect to the rate of the change of the inputs, which caused the input values to be read only at even steps of the cruise controller. The second issue regarded the scheduling of the BrakeCommand runnable, which was set to be executed only upon changes in the input pedal positions. A counterexample in the contract refinement showed that the validity of these input signals could change value without the BrakeCommand running so that the pedal position was not propagated to the CruiseControl. The model was fixed by adding a trigger of the BrakeCommand also associated to the valid signal of the pedal positions. In both cases, the bugs could be fixed by analyzing the counterexamples generated by EVA.

### 5 Data Availability Statement

The artifact described in the paper is not publicly available due to internal policy. Any requests can be directed to the corresponding author.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## WASIM: A Word-level Abstract Symbolic Simulation Framework for Hardware Formal Verification?

Wenji Fang1() and Hongce Zhang1,<sup>2</sup>

1 The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China wfang838@connect.hkust-gz.edu.cn 2 The Hong Kong University of Science and Technology, Hong Kong, China hongcezh@ust.hk

Abstract. This paper demonstrates the design and usage of WASIM, a word-level abstract symbolic simulation framework with pluggable abstraction/refinement functions. WASIM is useful in the formal verification of functional properties on register-transfer level (RTL) hardware designs. Users can control the symbolic simulation process and tune the level of abstraction by interacting with WASIM through its Python API. WASIM can be used to directly check formal properties on symbolic traces or to extract useful fragments from symbolic representations to construct safe inductive invariants as a correctness certificate. We demonstrate the utility of WASIM on the verification of two pipelined hardware designs. WASIM and the case studies are available under open-source license at: [9].

Keywords: Formal verification · symbolic simulation · abstraction refinement.

### 1 Introduction

Formal property verification (FPV) plays an essential role in hardware verification. Symbolic simulation is one of the model checking techniques used for FPV. It explores all paths of the design circuit simultaneously with symbolic values to work around the state explosion problem [6].

In this paper, we present WASIM, a word-level abstract symbolic simulation framework with customizable abstraction/refinement functions. In the practice of hardware formal verification, we consider the guidance from human verification engineers as the key to scaling formal techniques up for industrialsize designs. Therefore, in WASIM, we emphasize easy user-interaction that allows engineers to freely control the simulation process and plug-in their own

<sup>?</sup> The work has been supported in part by Guangdong Basic and Applied Basic Research Fund no. 2022A1515110178; by Guangzhou-HKUST(GZ) Joint Funding Scheme no. SL2022A03J01288; and by Guangzhou Basic Research Project no. SL2022A04J00615.

design-specific abstraction functions. WASIM can also ensure its trustworthiness through a certificate (an inductive invariant) constructed from the traces of symbolic simulation.

Fig. 1. Workflow of WASIM

Figure 1 demonstrates the workflow of WASIM. We highlight some of its features below:


The remainder of this paper is organized as follows. The next section demonstrates the functionalities of WASIM, followed by a short presentation of user interface in Sect. 3. Sect. 4 reports the results on case studies. Sect. 5 discusses related work. Finally, Sect. 6 concludes the paper.

### 2 WASIM Functionalities

The WASIM framework is built on top of PySMT [11], a unified interface for multiple SMT solvers. The functionalities are described below.

### 2.1 Input Processing.

The input Verilog circuits are initially processed by the open-source synthesis suite Yosys and transformed into the Btor2 format [15], an efficient word-level representation for a state transition system (STS). WASIM consumes Btor2 with a parser modified from CoSA (CoreIR Symbolic Analyzer) [14].

#### 2.2 Representing Simulation States using SMT formulas.

The state in WASIM is represented using SMT formulas, with one for each state variable assignment. There are also assumptions (SMT formulas) associated with each state. The assumptions capture the additional constraints on a symbolic trace, for example, certain input combinations will never happen. The state is reachable (realizable) if all assumptions are satisfiable. The state representation may also include undetermined values ('X' values). We keep a special set of SMT variables to represent the 'X' values.

#### 2.3 Symbolic Simulation.

Symbolic simulation is mainly achieved through substitution. Variables in the transition function of an STS are substituted by variable assignments from the previous cycle. Unassigned input or unknown state variables are replaced by 'X' values. WASIM can explore either the state in the next one cycle (single-step simulation) or traverse a set of states until no new (abstract) states are found (multi-step simulation). Expression simplification and abstraction are used in WASIM to reduce the size of the state representation.

#### 2.4 Expression Simplification.

Expression simplification reduces the size of an SMT formula in the state representation through the combination of various techniques. The built-in rewriting functionality in SMT solvers serves as the 'X'-agnostic simplification step. After this first step, WASIM proceeds with 'X'-aware simplification that checks if any 'X' value can be reduced given the state assumptions. For example, an 'X' is reducible if it resides in the unreachable branch of an ITE (if-then-else) operator. WASIM traverses the abstract syntax tree of SMT expressions and heuristically guess-and-check reducible 'X' values. When confirmed, WASIM further rewrites the expression to syntactically eliminate the 'X' values. We design several patterns for common rewriting. For the most general case, WASIM will fall back to query the CVC5 [2] SyGuS solver [1] to synthesize a new expression without 'X'.

#### 2.5 Abstraction Refinement.

We allow users to define abstraction functions that map a concrete state into an abstract domain. A simple example of such abstraction is to leave out certain registers in the symbolic state representation by replacing them with 'X' values. The abstraction could be design-specific — engineers familiar with the hardware microarchitecture may have better ideas on which registers to omit. Therefore, we give such freedom to the WASIM users and allow them to specify their own abstraction functions. Abstraction is also essential to the efficient state traversal because it is almost impossible to traverse the concrete state space of a large hardware design. When it is hard to pre-determine the best abstraction function, users can specify a refinement function and perform dynamic abstraction-refinement during symbolic simulation. An example of abstraction refinement function is demonstrated below in Sect. 3.2

### 3 User Interface

WASIM provides a Python interface to control the simulation, apply abstraction or refinement and manipulate the symbolic expressions in state representations.

### 3.1 Simulation Process Control.

WASIM provides a single-step simulation function sim one step for forward symbolic simulation of one clock cycle. Users can perform bounded-step simulation by using the function in a range-based loop.

On the other hand, there is often the need for unbounded simulation. WASIM provides an unbounded simulation function traverse all states. As its name suggests, this function instructs the simulator to search for all symbolic states that are reachable from the current state. Users may optionally provide a termination condition and the simulator will only search for reachable states before the condition becomes true. This is useful, for example, when searching for all symbolic states when an instruction is stalled in a certain pipeline stage.

### 3.2 Customizable Abstraction/Refinement Function.

Users may provide a callable Python object as the abstraction/refinement function. The abstraction function should transfer one symbolic state to its counterpart in the abstract domain, while the refinement function returns a list of states.

Here we give an example of user-specified dynamic abstraction refinement during symbolic simulation. In microprocessor verification, we can use symbolic simulation to check that the arithmetic processing pipeline is functionally correct by computing the output symbolic state from symbolic pipeline inputs. There are external signals coming into the pipeline that only affect latency rather than the arithmetic function. Abstraction can be applied to omit all external signals, however, the final abstract symbolic state might become too coarse. A refinement function can lazily bring back the external signals and branch the execution based on certain signal combinations, until the final symbolic states are sufficiently accurate to check for functional correctness. This example will require the simulator to have a pluggable interface for abstraction/refinement functions.

### 3.3 Symbolic State Extraction and Manipulation.

In order to use the result of symbolic simulation, WASIM allows users to freely extract and manipulate the symbolic expressions in a state representation. Simulation traces are available as Python lists. Users can collect all states in any simulation step and obtain the expressions of arbitrary state variable assignment. By checking the satisfiability of the conjunction of all variable assignments, the assumptions, and the negated property, users can check for property violations on a symbolic state. WASIM can also evaluate arbitrary functions over state variables given the variable assignment. This is useful to compute the symbolic value of wires in Verilog. Finally, users may re-assign an intermediate state and restart the simulation from that point.

Symbolic state extraction and manipulation enable two use cases: formal property verification and inductive invariant construction. Users can achieve formal property verification by checking the violation of properties on all abstract simulation states extracted from symbolic state traversal. Fragments of expressions in symbolic states are also helpful in the construction of inductive invariants, which could serve as the certificate for the abstract state traversal. For example,

$$(sv\_1 = \,\,expr\_1) \land (sv\_2 = \,\,expr\_2) \land \dots$$

indicates that the STS resides in one (abstract) symbolic state where sv1, sv2, ... are the state variables, and expr1, expr2, ... are the symbolic expressions in state representation. By taking the disjunction of all such formulas of all reachable abstract symbolic states, we cover the whole abstract state space and therefore, the disjunction will constitute an inductive invariant for this STS. To certify a specific safety property is valid, one can build from this inductive invariant with additional expression fragments to create a safe inductive invariant.

### 4 Case Studies

We demonstrate the usage of WASIM with two verification case studies on pipelined hardware designs. The design statistics are shown in Table 1, including the number of state bits and logic gates.

Designs under verification. The first design is a simple arithmetic pipeline with two variants implemented with or without external stall signals. They share the same datapath that performs a multiply-accumulate (MAC) operation. The second design is a simple 3-stage pipeline that resembles the backend of a processor core. It contains data forwarding logic and the control logic to handle external stall signals. Verification in this case study checks if these hardware designs are implemented with the correct functions. Despite the relatively small size, some are already nontrivial for a symbolic model checker.

Users' input. For simple MAC without stall signals, users only need to provide a simulation script with bounded simulation steps. For all other designs, certain stages may be stalled by external signals for a period of time. The simulation script instructs the simulator to case-split based on the value of external stall signals and symbolically explore all stalled states in each step. The abstraction function only keeps the concrete representation in the downstream of the stalled stage, therefore, there are only a small number of stalled states in the abstract domain. Finally, users may check the given properties are valid on every symbolic path and the symbolic expressions in the state representations are used to construct parts of inductive invariants. The inductive invariants are further checked to ensure the correctness of simulation process given the user-provided abstraction functions.

Results of the experiment. In the experiments, we compare with the IC3/PDR symbolic model checking method implemented in Berkeley-ABC. The last three columns in Table 1 are the time of symbolic simulation, the time of checking


Table 1. Experimental Results

functional properties on all traces and the time for checking the validity of inductive invariants. Results show that for the 3-stage-pipe-\* problems, with proper guidance from a human verification engineer, symbolic simulation can outperform autonomous model checking with order-of-magnitude speed-up. The results are obtained on a server running Ubuntu 20.04 with a 2.9 GHz Intel Xeon(R) Platinum 8375C CPU and 128G RAM.

### 5 Related Works

Apart from WASIM, VossII [16] is another tool for hardware symbolic simulation which implements the symbolic trajectory evaluation (STE) method [12,13]. VossII is mainly on the bit level using binary decision diagrams (BDDs) as the state representation. Several extensions to the original STE method have been proposed so far. For example, generalized STE (GSTE) enables unbounded property verification using assertion graphs [18], and the word-level STE (WSTE) achieves a higher level of abstraction with word-level variables in bit-fields [7]. These extensions are typically only available in a commercial STE implementation. Moreover, users must be fluent in a domain-specific functional programming language named fl in order to use VossII.

On the other hand, tools based on symbolic model checking are broadly available for hardware formal verification, for example, Berkeley-ABC [5], which is a powerful open-source tool implementing a collection of various model checking algorithms [3,4,8]. Unlike symbolic simulation, symbolic model checking runs autonomously to prove or falsify given properties without user interactions. However, without proper human guidance, model checking tools may suffer more from the scalability problem.

### 6 Conclusions

In this paper, we present the design and usage of WASIM, a word-level abstract symbolic simulation framework. WASIM is featured with a Python user interface and pluggable abstraction/refinement functions to facilitate human verification engineers to bring in their insights to better scale formal methods for hardware designs. Applications of WASIM include formal property verification and inductive invariant generation. Our case studies show that this strategy can be helpful for some problems that are hard for autonomous model checking.

#### Data Availability Statement

The data that support the findings of this study are openly available in WASIM: A Word-level Abstract Symbolic Simulation Framework for Hardware Formal Verification at https://doi.org/10.5281/zenodo.7247147, reference number [10]. The authors confirm that the data supporting the findings of this study are available within the article and its supplementary materials.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Multiparty Session Typing in Java, Deductively

Jelle Bouma<sup>1</sup> , Stijn de Gouw<sup>1</sup> , and Sung-Shik Jongmans1,2()

<sup>1</sup> Open University of the Netherlands, Heerlen, the Netherlands ssj@ou.nl

<sup>2</sup> Centrum Wiskunde & Informatica (CWI), Amsterdam, the Netherlands

Abstract. Multiparty session typing (MPST) is a method to automatically prove safety and liveness of protocol implementations relative to specifications. We present BGJ: a new tool to apply the MPST method in combination with Java. The checks performed using our tool are purely static (all errors are reported early at compile-time) and resource-efficient (near-zero cost abstractions at run-time), thereby addressing two issues of existing tools. BGJ is built using VerCors, but our approach is general.

### 1 Introduction

Construction and analysis of distributed systems is hard. One of the challenges is this: given a specification S of the roles and the protocols an implementation I of processes and communication sessions should fulfil, can we prove that I is safe and live relative to S? Safety means "bad" communication actions never happen: if a channel action happens in I, then it is allowed by S. Liveness means "good" communication actions eventually happen (communication deadlock freedom). Multiparty session typing (MPST) [14,15] is a method to automatically prove safety and liveness of protocol implementations. The idea is shown in Figure 1:


The following simple example demonstrates global types and local types in Scribble notation [28], as used in the Scribble tool [16,17] for the MPST method.

Example 1. The Adder protocol [12] consists of two roles: Client (C) and Server (S). Client either asks Server to add two numbers (Add-message with two Intpayloads) or tells Server goodbye (Bye-message). In the former case, Server tells Client the result (Res-message). This is repeated until Server is told goodbye.

Fig. 5: Workflow of API-generation-based tools for the MPST method

Figure 2 shows three example runs as sequence diagrams. Figure 3 shows the global type. Notation "m(t1, . . . , tn) from p to q" specifies the communication of a message of type m with payloads of types t1, . . . , t<sup>n</sup> from role p to role q. Notation "choice at r { G<sup>1</sup> } or · · · or { G<sup>k</sup> }" specifies a choice among branches G1, . . . , G<sup>k</sup> made by role r. Figure 4 shows the local type for Client. The notation for local types resembles the notation for global types, except that communications are broken up into sends ("m(t1, . . . , tn) to q") and receives ("from p"). ut

A premier approach to apply the MPST method in combination with mainstream programming languages is based on API generation (Figure 5); it is used in the majority of MPST tools, including Scribble [16,17], its extensions [32,5,25,22,8,23,9,27,35], StMungo [21], νScr [34], mpstpp [20], and Pompset [6]. The main ideas, first conceived by Deniélou/Hu/Yoshida and pursued in Scribble, follow two insights: (a) local types can be interpreted as deterministic finite automata (DFA) [10,11], where every transition models a send/receive action; (b) DFAs can be encoded as object-oriented application programming interfaces (API) [16,17], where classes and methods model states and transitions.

Example 2. Figure 6 shows the DFA and a Java API for Client in Adder (Example 1), in the style of Scribble. Transition labels of the form q !m(t1, . . . , tn) and p?m(t1, . . . , tn) in the DFA specify the send to q and the receive from p of a message of type m with payloads of types t1, . . . , tn. Classes State1, State2, and State3 in the API correspond to states 1, 2, and 3 of the DFA; the methods of class Statei in the API correspond to the transitions from state i in the DFA.

Figure 7 shows a process for Client, using the Java API. The idea is to write method client that consumes an "initial state object" s1 as input and produces

21

Fig. 6: DFA and Java API for Client in Adder (Scribble-style)

a "final state object" s3 as output. First, the only communication actions that can be performed, are those for which s1 has a method. When called, the communication action is performed and a fresh "successor state object" s2 (line 4) or s3 (line 8) is returned. Next, the only communication actions that can be performed, are those for which s2 or s3 has a method. And so on. By using state objects in this way, a run of method client simulates a run of the DFA. ut

However, existing API-generationbased tools that follow Example 2 in MPST practice, do not fully meet the promise of MPST theory, in two ways:

1. Mixed static/dynamic checks: To ensure safety and liveness, every non-final state object must be used linearly (exactly one method

```
1 State3 client ( State1 s1 ) {
2 int x = 1; int y = 2;
3 while (x + y < 100) {
4 State2 s2 = s1 . sendAddToS (x , y );
5 int [] buff = new int [1];
6 s1 = s2 . recvResFromS ( buff );
7 x = y; y = buff [0]; }
8 State3 s3 = s1 . sendByeToS ();
9 return s3 ; }
```
#### Fig. 7: Process for Client in Adder

call). However, the type systems of most mainstream programming languages are too weak to check linear usage statically. Instead, dynamic checks are needed (e.g., method use in Figure 6). As a result, MPST practice is weaker than MPST theory: in MPST practice, some errors are reported late at runtime, whereas in MPST theory, all errors are reported early at compile-time.

2. Resource-inefficient checks: Every time when a communication action is performed, a fresh state object is created. This costs time (allocation; garbage collection) and space. As a result, MPST practice is costlier at run-time than MPST theory: in MPST practice, API-encodings of DFA-interpretations of local types have a real footprint (proportionate to the number of communication actions), whereas in MPST theory, local types are zero cost abstractions.

In this paper, we present BGJ : a new API-generation-based tool to apply the MPST method in combination with Java. The checks performed using BGJ are purely static (all errors are reported early at compile-time) and resource-efficient (near-zero cost abstractions at run-time), thereby addressing the issues above. Instead of building a new static analyser from scratch, we leverage a state-of-theart deductive verifier for Java, namely VerCors [2]. Under active development for years, VerCors has been used in industrial case studies, too [26,18,30]. We note that our approach is generic, though, while our current tool is VerCors-specific.

```
1 class DFA {
2 int state ;
3 //@ ensures Perm (state , write );
4 //@ ensures state == 1;
5 DFA () { state = 1; }
6
7 //@ context Perm (state , write );
8 //@ requires state == 1;
9 //@ ensures state == 2;
10 void sendAddToS ( int x , int y ) {
11 state = 2; ... }
                                       12 //@ context Perm (state , write );
                                       13 //@ requires state == 1;
                                       14 //@ ensures state == 3;
                                       15 void sendByeToS () {
                                       16 state = 3; ... }
                                       17
                                       18 //@ context Perm (state , write );
                                       19 //@ requires state == 2;
                                       20 //@ ensures state == 1;
                                       21 int recvResFromS () {
                                       22 state = 1; ... } }
```
Fig. 8: Java API for Client in Adder (BGJ-style)

### 2 Usage: BGJ in a Nutshell

BGJ follows the same workflow as in Figure 5. We explain the steps below.

Steps 1-3: global types; local types; DFAs. First, the programmer manually writes a global type in Scribble notation (e.g., Figure 3). Next, BGJ automatically projects the global type to local types, and it automatically interprets the local types as DFAs. This is standard and as usual [16,17].

Step 4: APIs. Next, BGJ automatically encodes the DFAs as APIs. Our approach is to encode a DFA of n states as an API of a single class instead of n classes (Figure 6). At run-time, only one instance of this class is created ("nearzero cost abstraction"); this instance allows any number of usages (method calls). To be able to check that these usages are proper, a key novelty of our approach is that BGJ also generates annotations for method contracts, Hoare-logic-style.

Example 3. Figure 8 shows the Java API for Client in Adder (Example 1), generated using BGJ (cf. Figure 6). Field state of class DFA identifies the current state; the methods of class DFA correspond to transitions. The annotations ("//@ ...") define for each method: a precondition ("requires"; what must be true before a call?), a postcondition ("ensures"; what will be true after?), and a method invariant ("context"; read/write permissions for which fields are needed?). ut

Step 5: processes. Last, the programmer manually writes processes using the APIs and automatically verifies proper usage with VerCors (i.e., methods are called only if the preconditions hold). These checks are purely static. If successful, safety relative to the global type and liveness (communication deadlock freedom) are as-

```
1 //@ context Perm (a.state , write )
2 //@ requires a. state == 1;
3 //@ ensures a. state == 3;
4 void client ( DFA a ) {
5 int x = 1; int y = 2;
6 //@ loop_invariant a. state == 1;
7 while (x + y < 100) {
8 a. sendAddToS (x , y );
9 x = y ; y = a . recvResFromS (); }
10 a. sendByeToS (); }
```
Fig. 9: Process for Client in Adder

sured; else, a bug is found ("all errors are reported early at compile-time").

Example 4. Figure 9 shows a process for Client in Adder (Example 1), using the Java API in Figure 8. It resembles Figure 7, except that method client and the loop are annotated with a simple contract and invariant. Using VerCors, we can verify that the methods are called only if the preconditions hold. Conversely, if we duplicate line 8, then VerCors reports an error: consecutively sending two Add-messages is forbidden. This can be detected only dynamically in Figure 7 (i.e., a RuntimeException would be thrown in UseOnce of Figure 6). ut

#### 3 Implementation

BGJ is implemented in Java. It reuses the front-end of Scribble for global types, local types, and DFAs in steps 1-3 and, thus, supports the same features (including input branching). The encoder of DFAs as APIs in step 4 is new. It generates two versions of every API: concrete (e.g., Figure 8) and abstract (e.g., Figure 8 without "..."). The concrete API is for running a process. The abstract API, which omits all verification-irrelevant details, is for verifying a process.<sup>3</sup> At run-time, TCP is used to transport messages between processes.

Besides the APIs, BGJ also generates "skeletons" of process code. These skeletons represent the basic control flow (adapted from the DFAs) with send... and recv... method calls in the right places (guaranteed to pass verification). The skeletons can subsequently be filled in with the actual computations.

#### 4 Preliminary Evaluation

We obtained first practical experience with BGJ to study its two improvements. Regarding "all errors are reported early at compile-time", we investigated how much time the verification step of VerCors takes for eight example protocols in Scribble's repository [13]. Figure 10 shows the results, averaged over thirty runs, using generated skeletons as process code. A preliminary conclusion is that the extra time can be low enough (worth the effort<sup>4</sup> ) for our approach to be feasible.

Regarding "near-zero cost abstractions at run-time", we investigated run-time overhead of a Scribble-based process (e.g., Figure 6) vs. a BGJ-based process (e.g., Figure 8) for Client in Adder. We factored out code common to both versions (e.g., actual transport of messages over the wire), to be able to specifically measure the impact of the differences (methodology of Castro et al. [5]). Averaged over thirty runs, the Scribble-based process and the BGJ-based process

<sup>3</sup> The generated annotations are compatible with VerCors 1.0 and above; VerCors can be used as-is. A limitation of our approach is that VerCors supports only a subset of Java. This affects the set of Java features supported for processes.

<sup>4</sup> Usage of BGJ requires two kinds of effort. First, a method in hand-written process code needs to be annotated if the body uses a generated API. All the other code typically the vast majority of the program (e.g., business logic, database access)—can be tagged to be skipped by VerCors. The few annotations to be added, are only about the state of the DFA at the beginning/ending of a method (pre/postconditions), or at the beginning of each iteration (loop invariants). This is similar to the effort of manually tracking state types when using the existing Scribble. Second, the validity of the annotations need to be checked by VerCors. This is fully automated.


Fig. 10: Time of VerCors (in seconds)

completed 2 <sup>31</sup> (Integer.MAX\_VALUE) iterations in 5221ms and 974ms, respectively. Our preliminary conclusion is that our approach is indeed more resource-efficient.

### 5 Conclusion

Related work. The combination of the MPST method and deductive verification is largely unexplored territory. The only other work, by López et al. [24], uses deductive verifier VCC [7] to statically check safety and liveness of C+MPI protocol implementations relative to MPST-based specifications. Their approach is very different from ours, though, as it is not based on API generation.

The approach of encoding DFAs of n states as APIs of a single class was recently studied by Cledou et al. [6], by leveraging advanced features of the type system of Scala 3. Their approach does not address the issues in Section 1, though, whereas our approach does. Previous attempts to address the issue of "mixed static/dynamic checks" either target a programming language with a stronger type system (Rust) [22,8,23,9], or adopt callback-style APIs in the specific context of event-based programming [35,34]. In contrast, our approach does not rely on (the strength of) the type system of the targeted programming language, and it supports traditional procedural/object-oriented programming.

Closest to BGJ is StMungo [21]: the approaches of both tools are similar, but the underlying static analysis techniques differ. BGJ leverages method contracts and deductive verification, while StMungo is based on typestate [33]. A key advantage of using deductive verification is that it immediately opens the door to reasoning about functional correctness (next paragraph).

Future work. There are two next steps. First, now that we have the infrastructure to combine the MPST method and deductive verification, we are keen to explore their further integration to reason about functional correctness of distributed systems. VerCors is based on concurrent separation logic [29,4], so key capabilities to reason about concurrency are already in place. This is connected to work in which separation logic is used to control I/O operations (e.g., Penninckx et al. [31]). Second, while the usage of deductive verification is central to BGJ, our approach does not crucially depend on VerCors: we chose it because it is a fully automated, well-supported deductive verifier for Java, but other tools (e.g., KeY [1], VeriFast [19]) offer opportunities worth investigating, too.

### Data Availability Statement

The artifact is available on Zenodo [3]. It contains: (a) our tool and its dependencies; (b) material to replicate the example in Section 2; (c) material to replicate the experiments in Section 4.

### References


27


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## PyLTA: A Verification Tool for Parameterized Distributed Algorithms

Bastien Thomas and Ocan Sankur()

Univ Rennes, Inria, CNRS, Rennes, France ocan.sankur@irisa.fr

Abstract. We present the tool PyLTA, which can model check parameterized distributed algorithms against LTL specifications. The parameters typically include the number of processes and a bound on faulty processes, and the considered algorithms are round-based and either synchronous or asynchronous.

### 1 Introduction

Distributed algorithms — algorithms that run on multiple communicating processes — are used in many domains including scientific computing, telecommunications and the Blockchain. Standard distributed algorithms typically perform relatively simple tasks such as consensus or leader election[17], but complexity arises from the lack of reliability of the network: some processes may crash, communications may be lost, faulty processes may send arbitrary messages (Byzantine faults). . . In this setting, various automated verification techniques have been developped in order to provide guarantees on the executions of such algorithms. Notably, parameterised verification attempts to verify these algorithms for every possible number of processes and faults at once [4].

Threshold automata [14] (TA) are a formalism based on counter abstraction [18] that model asynchronous distributed algorithms with parameterised number of processes under crash and Byzantine faults. Verification can be performed using a complete encoding to SMT formulas [13]. The decidabililty of generalisations of these models was studied in [16] while [1] focuses on the complexity of the underlying problems. These algorithms were implemented in the Byzantine model checker ByMC [15]. However, algorithms based on threshold automata require bounding the diameter of the underlying transition system, either in the asynchronous case with bounded protocols (with only finitely many exchanged messages) in [14], or with unbounded messages but in the synchronous case, and for reachability properties only [20]. These techniques are therefore incomplete for threshold automata where such a bound does not exist.

In this article, we introduce PyLTA, a tool for fully verifying parameterised distributed algorithms both in the synchronous and asynchronous cases, without bounding the diameter of the state space or the number of exchanged messages. It is based on layered threshold automata (LTA), a formalism developped in [3] which can be thought of as some form of infinitely repeating threshold automata. These generalise the synchronous TAs used in [20] and can handle both synchronous and asynchronous communication by exploiting some notions similar to communication closure [8]. This allows us to verify any LTL formula, including liveness properties, even on algorithms where processes may send unboundedly many messages (unlike [14] where only finite TAs and a fragment of LTL was considered).

Concretely, PyLTA takes as input the LTA description of a parameterised distributed algorithm as well as an LTL specification. It then verifies the specification under all parameter valuations, or finds a counterexample disproving the specification. The tool is meant to provide support for distributed algorithm designers. In fact, distributed algorithm design is not a single step process. In practice, the implemented versions of an algorithm often contain additional features or optimizations, and PyLTA can be used to automatically check these variants for counterexamples.

#### 2 Modeling Distributed Algorithms

In order to illustrate the capabilities of PyLTA, we use the Phase King algorithm (Algorithm 1) [2]. In general, the algorithms that can be handled by PyLTA exhibit the following characteristics:


Under these conditions, algorithms can be encoded in an LTA. The last two conditions can often be worked around. For example, we will show along this article how Algorithm 1 can be verified despite the fact that the condition on line 6 is not ameanable to counter abstraction as it uses the identity of processes which is lost in the abstraction.

Algorithm 1 uses the parameters n, and t with the condition t < n 4 . We introduce an additional paramter f ≤ t which is the actual number of faulty processes: the algorithm does not have access to f, but it is used during verification. Communication closure yields a layered structure of our models: a layer indexed by ` ∈ N models the portion of the program that deals with messages tagged with `. In Algorithm 1, the layer ` = 2i corresponds to lines 3-5, while layer ` = 2i + 1 corresponds to lines 6-12.


Algorithm 1: The Phase King algorithm [2] is a synchronous algorithm that solves binary consensus under t < n 4 Byzantine faults. It executes t+1 rounds, and each round i ∈ {0 . . . t} is further decomposed into two layers (for round i, the layers are named 2i and 2i+1). In layer 2i, the processes broadcast their preferences v, and in layer 2i+1, they update v either to the majority if it is strong enough, or to the preference of the process with id i, which is the king of the round i.

We use counter abstraction to model executions of the algorithm, meaning that we define a counter storing the number of processes at each state of the algorithm. Here, our approach differs from other works on threshold automata because we count the number of processes that have been through the state instead of those that are currently in it. It follows that the number of messages m sent during the execution can be accurately deduced from these counter values as the number of processes at states where messages m have been sent. The downside of counter abstraction is that the identities of the processes are lost. Notably, the condition on line 6 needs to be abstracted with a non deterministic choice.

Fig. 1: A configuration of the Phase King algorithm (Algorithm 1).

Configurations. PyLTA verifies properties on all reachable configurations. A configuration can be interpreted as a record of events that occured during an execution. An example is depicted in Fig. 1 which we now explain.

The configuration contains an instantiation of the parameter values (given on the bottom of the figure). Moreover, for each layer index, it specifies the number of correct (i.e. non-faulty) processes that were at a given state at that layer; as well as the number of correct processes that moved from one state to another between consecutive layers.

In Fig. 1, initially, 2 correct processes are at state a1, and 2 are at a0, for a parameter valuation n = 5, t = 1, f = 1. Recall that layers 2i and 2i+1 correspond to round i, and that the meaning of the states are given in Algorithm 1; in particular, a<sup>x</sup> is the first line of an iteration where variable v has value x. All 4 correct processes go to b? at layer 1, which means that the Byzantine process was king at round 0. Then three of them go to a<sup>1</sup> at layer 3, and one of them goes to a0, etc. This models the situation where the Byzantine process sent a message (2 × 0 + 1, 1) to the latter process but (2 × 0 + 1, 0) to the others. In the next layer, a correct process is king with value 1 (state k1), and one correct process has received a majority of value 1 (state b1), but not all correct processes have arrived to layer 4 yet. This configurations thus represents a finite prefix of an execution. When needed, LTL fairness assumptions can ensure that we only consider infinite configurations.

#### 3 Input Format and Usage

The input format is based on layered threshold automata (LTA) defined in [3], which we illustrate on the running example. An input file needs to define three elements: parameters, states and guards.

In PyLTA, the set of parameters are declared as follows.

```
PARAMETERS: n, t, f
PARAMETER_RELATION: 4*t < n
```
The second line declares a constraint on these parameters, here 4t < n, which is a necessary condition for the correctness of Algorithm 1.

As in our running example, the input format assumes that the states of the considered systems belong to layers. The following line defines two consecutive layers A, B, and specifies after layer B, we come back to layer A and loop.

LAYERS: A, B, A

In other terms, this results in the sequence of layers A, B, A, B,.... One can also specify lasso-shaped sequences; for instance, LAYERS: A, B, B would yield the sequence A, B, B, B, ....

States can be declared by specifying the name of the layer and the name of the state separated by a period as below.

STATES: A.0, A.1 STATES: B.k0, B.0, B.u, B.1, B.k1 For instance, the first line defines the states a<sup>0</sup> and a<sup>1</sup> in Figure 1, and the second line is the rest of the states.

Transitions are defined by distinguishing cases for each state using guards. In Algorithm 1, a process needs to receive more than <sup>n</sup> 2 + t messages (2i, 1) in order to move from state a<sup>1</sup> (line 3) to b<sup>1</sup> (line 11). These messages can either come from processes in state a<sup>1</sup> or from Byzantine processes. In PyLTA, this condition is called the guard from a<sup>1</sup> to b<sup>1</sup> and it is expressed with the formula 2(a<sup>1</sup> + f) > n + 2t. State names correspond to the number of correct processes that have been at that state, so transitions are declared as follows.

```
FORMULA Afull: A.0 + A.1 + f == n
CASE A.1:
  IF Afull & 2*(A.1 + f) >= n THEN B.k1
  IF Afull & 2*(A.1 + f) >= n + 2*t THEN B.1
...
```
The formula Afull is used to enforce synchrony: no process can take a transition before every message was received. We present the other transitions for Algorithm 1 in Table 1. Note that Afull or an equivalent Bfull should also be added each time in order to avoid considering asynchronous executions.

The following instruction is used to declare an LTL specification to be verified on the configurations:

```
WITH
  A.initial: A.0 + A.1 + f == n
  A.one0: A.0 > 0
  B.not_two_kings: B.k0 + B.k1 <= 1
VERIFY: (A.initial & ! A.one0 & G(B -> B.not_two_kings)) -> G(A -> ! A.one0)
```
The instructions between WITH and VERIFY define predicates at given layers, which can be used in the subsequent LTL formula. Here, A.one0 holds when at least one process is in state A.0; and B.not\_two\_kings is used to prevent executions where more than one king is present in a round. These predicates can then be used as propositions of the LTL formula that will be verified.

A layer type name (A or B) inside a formula indicates a predicate that only holds in the corresponding layers. An interpretation of the formula can therefore

Table 1: The guards of the transitions for Algorithm 1. The table on the left is for transitions leaving states of layers ` = 2i, and the table on the right is for those with layer ` = 2i +1. Each cell is the guard of the transition from the state of the row to the state of the column.



be the following: "if there are n processes, and no process in A.0, and there is always at most one non-Byzantine king in layers of type B, then at all layers of type A, there is no process in A.0."

#### 4 Tool Overview and Usage

PyLTA is written in Python. In addition to counter abstraction and predicate abstraction, PyLTA performs counter-example guided abstraction refinement [6]. Since we are working in an unbounded domain due to parameters, the tool uses an SMT solver to check the realizability of the traces, and refine the abstraction using interpolants produced by the solver [12]. The current version uses MathSAT [5] via PySMT [11]. We use Lark[19] for parsing.

The LTL specification is first negated, and then converted into a B¨uchi automaton using Spot [10]. The product between this automaton and the predicate abstraction is then built dynamically. We check the language emptiness of the resulting product automaton; if it is empty, then the specification holds. Otherwise, the abstract counterexample is checked for realizability using the SMT solver, and either the counterexample is confirmed, or the abstraction is refined.

We run PyLTA on an input file as follows.

```
python -m pylta [input_file]
```
The output on the file corresponding to our running example is the following:

```
VERIFYING R.initial & ! R.one0 & G (B -> B.not_two_kings) ...
Formula is Valid
```
More details such as the abstract counter examples encountered and the added predicates can be obtained by adding a -v flag. In this case, a single refinement was necessary, which added the predicate B.k0 + B.0 + B.u <= 0.

The verification algorithm does not require user interaction since abstractions are refined automatically. However, any predicate defined in the VERIFY instruction is used in the predicate abstraction, even if it does not appear in the formula. This behaviour provides a way to manually add predicates in order to help with the verification. The tool is distributed under the GNU GPL 3.0 licence and is available at https://gitlab.com/BastienT/pylta.

#### 5 Conclusion

We have presented PyLTA, a tool for verifying parameterised distributed algorithms. Despite the undecidability barrier even in simple versions of the problem [20], PyLTA is able to verify complex properties on distributed algorithms, and unlike previous works, makes no assumptions on bounds on the state space or exchanged messages. As future work, one might explore the use of implicit predicate abstraction [21] to speed up the verification process. Another direction would be to integrate well ordered functions providing termination arguments [7] as used in [9] which could extend the usability of PyLTA.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## FuzzBtor2: A Random Generator of Word-Level Model Checking Problems in Btor2 Format?

Shengping Xiao<sup>1</sup> , Chengyu Zhang<sup>2</sup> , Jianwen Li1() , and Geguang Pu1,3()

> <sup>1</sup> East China Normal University, Shanghai, China spxiao@stu.ecnu.edu.cn, {jwli,ggpu}@sei.ecnu.edu.cn <sup>2</sup> ETH Zurich, Zurich, Switzerland chengyu.zhang@inf.ethz.ch

<sup>3</sup> Shanghai Trusted Industrial Control Platform Co., Ltd, Shanghai, China

Abstract. We present FuzzBtor2, a fuzzer to generate random wordlevel model checking problems in Btor2 format. Btor2 is one of the mainstream input formats for word-level hardware model checking and was used in the most recent hardware model checking competition. Compared to bit-level one, word-level model checking is a more complex research field at an earlier stage of development. Therefore, it is necessary to develop a tool that can produce a large number of test cases in Btor2 format to test either existing or under-developed word-level model checkers. To evaluate the practicality of FuzzBtor2, we tested the state-of-the-art word-level model checkers AVR and Pono with the generated benchmarks. Experimental results show that both tools are buggy and not mature enough, which reflects the practical value of FuzzBtor2.

### 1 Introduction

Model checking plays an influential role in modern hardware design [4]. Its great success is inseparable from propositional methods such as Binary Decision Diagrams (BDDs) [10] and Boolean SATisfiability (SAT) solver [14]. Since BMC [6] was introduced, influential hardware model checking methods such as IMC [20], IC3 [9], and CAR [18] are all SAT-based. At the same time, many important efforts have been made to apply SAT-based model checking techniques to word-level verification tasks whose background theory are first-order logic [7,23,11,19,16]. These works all rely on more expressive reasoning techniques, i.e., Satisfiability Modulo Theories (SMT) [3] solvers. As the performance of the SMT solvers continues to improve [1,22], word-level hardware model checking has become a promising research area. Word-level reasoning is more powerful and opens up many possibilities for simplification [5]. It is strong evidence that a

c The Author(s) 2023

<sup>?</sup> Jianwen Li is supported by National Natural Science Foundation of China (Grant #U21B2015 and #62002118) and Shanghai Pujiang Talent Plan (Grant #20PJ1403500). Geguang Pu is supported by National Key Research and Development Program (Grant #2020AAA0107800), and Shanghai Collaborative Innovation Center of Trusted Industry Internet Software.

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. https://doi.org/10.1007/978-3-031-30820-8\_5 36–43, 2023.

Implementing word-level reasoning tools such as SMT solvers and word-level model checkers is much more complex and difficult than bit-level tools. For wordlevel model checking, which is a developing and immature area, it is an urgent requirement to obtain a large number of diverse benchmarks that can be used for bug finding and performance evaluation. Responding to this requirement, we present FuzzBtor2, a fuzzing tool that can generate random word-level model checking problems. We choose Btor2 [21] as the format of output files, which is simple, line-based, and easy to parse. Btor2 is also the current official format for the hardware model checking competition [2]. Most of mainstream word-level model checkers support Btor2 format directly (AVR and Pono [19]) or indirectly (nuXmv [11] and IC3ia [13]). To evaluate whether FuzzBtor2 is practical, we test two state-of-the-art word-level model checkers AVR and Pono that can read Btor2 files directly via Btor2 files generated by FuzzBtor2, and generated test cases trigger various errors of both checkers. We expect that FuzzBtor2 becomes infrastructure for the development of word-level model checkers.

### 2 Word-Level Model Checking and Btor2 Format

We assume that the reader is familiar with standard first-order logic terminology [3]. Words generally refer to terms with bit-vector ranges, optionally combined with other theories. The background theory of Btor2 is the Quantifier-Free theory of Bit Vectors with Arrays extension (QF\_ABV), by which almost all computer system information can be encoded. And the invariant property is (one of) the most important property classes to verify.

A model checking problem consists of a transition system and a property to verify. A transition system is a tuple S = (V, I, T) where


Given a transition system S = (V, I, T), its state space is the set of possible variable assignments. I and T determine the reachable state space of S. The bad property is represented by a formula ¬P over V . A model checking problem can be defined as follows: either prove that P holds for any reachable states of S, or disprove P by producing a counterexample. In the former, the system is safe, and in the latter, the system is unsafe. There are input variables in some transition systems, which can be modeled as state variables whose corresponding next states are unconstrained. Assume that a Btor2 file includes n<sup>s</sup> state variables, n<sup>c</sup> constraints, and n<sup>b</sup> bad properties. Its initial state space consists of n<sup>s</sup> initformulas. The transition relation consists of n<sup>s</sup> next-formulas and n<sup>c</sup> constraintformulas. And the bad property consists of n<sup>b</sup> bad-formulas. The sorts of initformulas and next-formulas should be consistent with the corresponding state variables, and constraint-formulas and bad-formulas are Boolean sort.

### 3 The FuzzBtor2 Tool

FuzzBtor2 is an open-source software consisting of approximately 2400 lines of C++11 code. FuzzBtor2 does not rely on specific libraries and it is self-contained. In this section we introduce the usage and architecture of FuzzBtor2. The tool is available at https://github.com/CoriolisSP/FuzzBtor2.

#### 3.1 Usage

The command to execute FuzzBtor2 in Linux systems is ./fuzzbtor [options]. We present the usage and features of FuzzBtor2 along with the options here.








### 3.2 Architecture

The architecture of FuzzBtor2 consists of preprocessor, generator, and printer. Users of FuzzBtor2 only specify some arguments on the command line, and no other input is given. From command line arguments, the preprocessor sorts out


the information required by the generator and saves it as a configuration. According to the configuration, the generator constructs some syntax trees that satisfy requirements of the number and sorts as stated in Sec. 2. These syntax trees encode a set of first-order logic formulas, which essentially is a model checking problem independent of the Btor2 format. At last, the printer outputs syntax trees constructed by the generator in Btor2 format.

The generator is the key component of FuzzBtor2. The generator constructs a syntax tree recursively, that is, a syntax tree with a depth greater than 1 consists of sub-syntax trees, operators, and some possible parameters (only for indexed operators). When the recursive process reaches the base case, i.e., a leaf node of the syntax tree, it randomly decides to return a (state or input) variable or a constant based on a certain probability. Due to the limitation of the number and sort of variables, if the generator chooses to return a variable, it may encounter a situation where the required leaf node cannot be constructed. Therefore, FuzzBtor2 does not guarantee that the Btor2 file can be successfully generated, and some parameters would cause the construction to fail. The overall process of constructing a syntax tree is described in Algorithm 1.

### 4 Experimental Evaluation

Tested Tools. In order to evaluate whether FuzzBtor2 is practical, we choose two state-of-the-art word-level model checkers AVR [17] and Pono [19] as tested tools. Both checkers can take Btor2 as direct input format, and won the first and third place respectively in the 2020 Hardware Model Checking Competition [2].


Table 2: Classification and statistics of error messages. The first type of error message of Pono has been confirmed by its developers.


Experimental Setups. We run FuzzBtor2 repeatedly with different parameters to generate a total of 200 test cases, in which 100 cases are array-free, i.e., without array variables (BV), and 100 cases include array variables (ABV). The command of FuzzBtor2 used for the former purpose is fuzzbtor2 --seed i --maxdepth 4 --constraints 1 --bv-states 3 --arr-states 0 --max-inputs 3 --candidate-sizes 1..8. To generate Btor2 models with array variables, the command is fuzzbtor2 --seed i --max-depth 4 --constraints 1 --bv-states 2 --arr-states 1 --max-inputs 3 - candidate-sizes 1..8. And i takes the value from 0 to 99. For every tested checker, the timeout to solve each instance is set to one hour.

Correctness. We use catbtor provided by btor2tools<sup>4</sup> [21] to verify the correctness of outputs of FuzzBtor2. All Btor2 files generated by FuzzBtor2 pass the check of catbtor, which means all Btor2 models generated by FuzzBtor2 are legal in syntax. Moreover, neither of the two tested tools (AVR or Pono) returns error messages that are relevant to the syntax issue of input Btor2 files.

Results. We perform 200 calls to FuzzBtor2 and we get 100 BV test cases and 98 ABV test cases. Two calls for ABV test cases fail due to the situation discussed in sec. 3.2. The file sizes of the generated test cases are not large, with a maximum of 58 lines, a minimum of 22 lines, and an average of 39.2 lines. We use the generated 198 test cases to find bugs of AVR and Pono. All solving processes return results immediately, regardless of success or failure, except a situation where AVR timeouts on an ABV case. Table 1 presents overall statistical results. Neither AVR or Pono performs very well, since most of the test cases (157 vs. 127) trigger their bugs. And Table 2 presents the classification and statistics of error messages returned by tested tools. We encounter 12 and 6 different types of error messages for AVR and Pono respectively. It can be seen from Table 2 that ABV test cases trigger more types of errors than BV, which matches the fact that more code is covered in the process of solving a case in more complex theory. Considering both two tables, AVR performs worse than Pono in the experiments, where AVR solves fewer test cases and returns more types of error messages. Besides, the case where AVR timeouts is solved (Safe) by Pono, and is a Btor2 file with only 43 lines, so we speculate that a performance issue occurs in AVR.

#### 5 Conclusion

We have presented FuzzBtor2, an open-source tool for the generation of random Btor2 files, by which the generated test cases can trigger various errors of state-of-the-art word-level model checkers. Several future works are being considered. First, if easy-to-trigger bugs of the tested tools are fixed, we could generate Btor2 files of larger size and filter out benchmarks that can be used for performance evaluation through experiments. Second, there are some keywords (output, fair, and justice) of Btor2 that are not supported by current FuzzBtor2, and we can extend the functionality of FuzzBtor2 to support them in future versions. Finally, as stated in sec. 3.2, the set of syntax trees constructed by the generator of FuzzBtor2 is essentially a model checking problem, independent of Btor2 format. Therefore, it would be useful to print model checking problems randomly generated in other formats such as Smv [8] and Vmt [12].

<sup>4</sup> https://github.com/boolector/btor2tools

Data-Availability Statement The artifact that supports the experimental results is available in Zenodo with the identifier https://doi.org/10.5281/ zenodo.7234681 [24].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Eclipse ESCETTM: The Eclipse Supervisory Control Engineering Toolkit**

W.J. Fokkink<sup>1</sup>*,*2() , M.A. Goorden<sup>3</sup>*,*<sup>4</sup> , D. Hendriks<sup>5</sup>*,*<sup>6</sup> , D.A. van Beek<sup>1</sup> , A.T. Hofkamp<sup>1</sup> , F.F.H. Reijnen<sup>7</sup> , L.F.P. Etman<sup>1</sup> , L. Moormann<sup>1</sup> , J.M. van de Mortel-Fronczak<sup>1</sup> , M.A. Reniers<sup>1</sup> , J.E. Rooda<sup>1</sup> , L.J. van der Sanden<sup>5</sup> , R.R.H. Schiffelers<sup>1</sup>*,*<sup>8</sup> , S.B. Thuijsman<sup>1</sup> , J.J. Verbakel<sup>1</sup> , J.A. Vogel<sup>4</sup>

<sup>1</sup> Eindhoven University of Technology, Eindhoven, The Netherlands <sup>2</sup> Vrije Universiteit Amsterdam, Amsterdam, The Netherlands

w.j.fokkink@vu.nl

<sup>3</sup> Aalborg University, Aalborg, Denmark

<sup>4</sup> Rijkswaterstaat, Utrecht, The Netherlands

<sup>5</sup> TNO-ESI, Eindhoven, The Netherlands

<sup>6</sup> Radboud University, Nijmegen, The Netherlands

<sup>7</sup> Vanderlande Industries, Veghel, The Netherlands

<sup>8</sup> ASML, Veldhoven, The Netherlands

**Abstract.** The Eclipse Supervisory Control Engineering Toolkit (ES-CETTM) is an open-source project to provide a model-based approach and toolkit for developing supervisory controllers , targeting their entire engineering process. It supports synthesis-based engineering of supervisory controllers for discrete-event systems, combining model-based engineering with computer-aided design to automatically generate correct-byconstruction controllers. At its heart is supervisory controller synthesis, a formal technique for the automatic derivation of supervisory controllers from the unrestricted system behavior and system requirements. Vital for the future development of these techniques and tools is the ESCET project's open environment, allowing industry and academia to collaborate on creating an industrial-strength toolkit. We report on some crucial developments of the toolkit in the context of research projects with Rijkswaterstaat and ASML that have considerably improved its capability to deal with the complexity of real-life systems as well as its usability.

### **1 Introduction**

A supervisory controller, supervisor for short, coordinates the behavior of a cyberphysical system according to discrete-event observations of its system behavior. Based on such observations, the supervisor decides which events the system can safely perform and which events must be disabled, because they would lead to violations of requirements or to a blocking state. Engineering of supervisors is a challenging task, due to the high complexity of real-life discrete-event systems.

Supervisory control theory [21] underpins a model-based technique for automatically deriving a model of a supervisor from models of the uncontrolled system behavior and the system's requirements, such as functional or safety-related requirements that intend to rule out all undesired behavior. This is achieved by

disabling *controllable* (output) events, such as starting a motor. Supervisors exert no control over *uncontrollable* (input) events, such as sensor reports.

The Eclipse Supervisory Control Engineering Toolkit (ESCETTM, pronounced *èsèt*) project,<sup>12</sup> provides a model-based approach and toolkit for the development of supervisors. It targets the entire engineering process for the development of supervisors, including modeling, synthesis, simulation-based validation and visualization, formal verification, real-time testing, and code generation. This entire process is supported by CIF [1],<sup>3</sup> featuring an automata-based modeling language for convenient specification of large-scale systems, and tools that support synthesis-based engineering (SBE). SBE is an engineering approach to design and implement supervisors that combines model-based engineering with computeraided design to produce correct-by-construction controllers, by automating the engineering process as much as possible. While not detailed further in this paper, the ESCET project also comprises Chi [28], a hybrid language and toolset for modeling and simulation, developed by the same research group that developed CIF, and the ToolDef scripting language for the definition and execution of model-based toolchains, useful for combining different ESCET tools.<sup>4</sup>

The ESCET project, an Eclipse Foundation open-source project since 2020, builds upon decades of research and tool development at Eindhoven University of Technology. Vital for the evolvement from an academic into an industrially applicable toolkit are the years-long ongoing research collaborations with industry, including Rijkswaterstaat [7], ASML [27], and Vanderlande [29]. Rijkswaterstaat, part of the Dutch Ministry of Infrastructure and Water Management, is responsible for infrastructure in the Netherlands, including roads, bridges, tunnels, and waterway locks. ASML is an innovation leader in the semiconductor industry, providing chipmakers with all they need to mass produce patterns on silicon through lithography. Vanderlande is a market leader in logistic process automation for the warehousing, airport and parcel sectors. The quality of supervisory control software for such systems impacts their availability and reliability. Synthesisbased engineering allows for automation, modularization, and standardization, increasing quality and evolvability and decreasing life-cycle costs.

With the move to the Eclipse Foundation, and supported by the Eclipse Foundation's principles of transparency, openness, meritocracy and vendor-neutrality, the ESCET project aims to be an open environment and a growing community. It allows interested parties, such as academic and applied research institutes, industrial partners and tool vendors, to collaborate on and profit from further tool development for the model-based construction of supervisors. Furthermore, the project's open nature allows any vendor to develop commercial tool support.

We report on some crucial developments of the toolkit that have considerably improved its capability to deal with the complexity of real-life systems as well as its usability, as shown by the case studies reported in Section 5.

<sup>1</sup> See https://eclipse.org/escet.

<sup>2</sup> 'Eclipse', 'Eclipse ESCET' and 'ESCET' are trademarks of Eclipse Foundation, Inc.

<sup>3</sup> See https://eclipse.org/escet/cif.

<sup>4</sup> See https://eclipse.org/escet/chi and https://eclipse.org/escet/tooldef.

#### **2 Supervisory Controller Synthesis**

Figure 1 depicts the general system structure for supervisory control. A cyberphysical system consists of mechanical components to be controlled. Actuators drive their operation, while sensors indicate their status. Resource control provides low-level control, often offering more abstract actuator and sensor signals for higher levels of control to use. Supervisors ensure actuator signals at lower layers (the *plant*) that would violate requirements are disabled. Large systems may be divided into (layers of) subsystems, and supervisors can be present at each level, coordinating lower-level subsystems (only a single layer is depicted). A (sub)system is often controlled by a human operator through a graphical user interface, or part of a larger system to which it is connected by an interface.

**Fig. 1.** Structure of supervisory control.

Supervisory controller synthesis [21,33] automatically generates a correct-byconstruction supervisor model for a discrete-event system, given precise descriptions of the behavior of the plant components as well as the (safety) requirements for the overall plant behavior. These can be specified conveniently as extended finite automata (EFAs), i.e., automata with variables, guards and updates, possibly carrying invariants that restrict the state space [13].

Synthesis considers the synchronous product of the plant automata together with the requirement automata. That is, these automata synchronize on shared events, meaning these events must be executed simultaneously. If an event is missing in the local state of any plant automaton, or is restricted by a plant invariant, it is absent from the overall system state, and it is considered physically impossible. If, on the other hand, an event is missing only in the states of requirement automata, or is restricted by a requirement invariant, it is physically possible but must be disabled by the synthesized supervisor to ensure *safety*.

Controllable events (such as output signals to actuators) can be prevented by a supervisor, but uncontrollable events (such as input signals from sensors) cannot. To ensure *controllability*, if an uncontrollable event must be prevented, the supervisor makes the system state where it occurs unreachable by disabling all controllable events leading to it. Moreover, if an uncontrollable event leads to such a state, the origin state of this event must be made unreachable too.

If safety of, for instance, a drawbridge is ensured by forcing it to remain raised forever, it is useless for road traffic. Therefore states of the plant and requirement EFAs can be marked, for instance states where the bridge deck is lowered, the barriers are open, and the signals are green. A marked state in the synchronous product means all individual plant components are in a marked local state, in this case allowing traffic to proceed over the bridge. The supervisor must guarantee that the plant can always reach a marked state, by disabling (events leading to) states that violate this property. Such a supervisor is said to be *nonblocking*.

Supervisory controller synthesis ensures *safety*, *controllability* and *nonblockingness* of a system with respect to its requirements, accounting for all possible behavior, also disabling events that lead to problems such as blocking behavior or requirement violations much later in the system's execution. It does so by restricting as little behavior as possible, thus ensuring *maximal permissiveness*.

Next to ESCET toolkit, other supervisory controller synthesis tools include DESTool [16], DESUMA [25], Supremica [12], and TCT [6]. For a comparison between these tools see [24]. The ESCET toolkit can be used to specify various different models during the entire development process, including simulation models, as it has a rich set of concepts. This prevents having to use multiple languages. It has a strong focus on industrial application, with, e.g., modeling convenience, efficient algorithms, and checking for common mistakes.

#### **3 Synthesis-based Engineering Process**

Figure 2 shows ESCET's synthesis-based engineering process. It starts with a model-based specification, consisting of plant and requirement models, modeled as EFAs and/or invariants. To these models, supervisory controller synthesis is applied, resulting in a model of the supervisor. The ESCET toolkit supports synthesis both with its own synthesis tools, and by a transformation to Supremica.

Synthesis ensures that all specified requirements are satisfied by the synthesized supervisor. Verification, such as model checking, supported through transformations to UPPAAL [2] and mCRL2 [3], can be used to check other requirements not yet supported by synthesis, including liveness guarantees or timing requirements. Validation, supported by ESCET's automated or interactive simulation and visualization, helps to determine whether the specified requirements, and thus the supervisor, achieve the desired system behavior.

An implementation of the controller can be obtained automatically from a model of the supervisor, by generating code for its control software. The ESCET toolkit supports code generation for multiple languages and platforms, including Java, C, Simulink, and PLC code (IEC standard 61131-3) for multiple vendors.

**Fig. 2.** Simplified representation of ESCET's synthesis-based engineering process.

### **4 Technical Improvements**

We describe recent improvements and novel techniques that have been vital in making supervisory controller synthesis applicable to industrial-size cyberphysical systems. Some have already been integrated into the ESCET toolkit, while others are being integrated or are planned to be integrated.

*Symbolic synthesis* The ESCET toolkit is based on the symbolic supervisory controller synthesis algorithm from Ouedraogo et al. [19]. It iteratively strengthens guard predicates on transitions so that forbidden states become unreachable in the controlled plant. This represents a major step forward for the industrial applicability of supervisory controller synthesis, by allowing for synthesis of plants and requirements intuitively modeled as EFAs.

The use of EFAs also opens up the possibility to extract and represent the synthesized supervisor more compactly and intuitively [15]. The ESCET toolkit represents the supervisor model as the collection of the provided plant and requirement models together with the addition of a single EFA containing a strengthened guard for each controllable event.

*BDD Data Structure* The Binary Decision Diagram (BDD) data structure allows to efficiently and symbolically represent and manipulate predicates representing (parts of) state spaces [14]. Its use in ESCET's symbolic supervisory controller synthesis algorithm leads to major reductions of state space representations and computation times, which is essential for scalability.

Vital to the memory and running time characteristics of Reduced Ordered BDD representations and manipulations, as used by the ESCET toolkit, is the ordering of the Boolean variables [30]. Heuristic variable ordering algorithms that exploit the inherent structure of the system modeled as EFAs are able to significantly reduce the synthesis effort [11], especially for larger inputs, making synthesis applicable to more complex systems.

*Multilevel Synthesis* Contrary to monolithic synthesis, where only a single supervisor is synthesized, with multilevel synthesis [10] the plant components and requirements are grouped together into a hierarchical structure, and a separate supervisor is synthesized for each group. This allows to distribute the control problem over multiple cooperating supervisors, which together are significantly smaller than one monolithic supervisor. By encoding relations between plant components and requirements in a design structure matrix [5], and algorithmically reordering its rows and columns to place tightly coupled plant components side by side [32], a suitable multilevel structure can be obtained. Compared to monolithic synthesis, this can for certain systems substantially reduce synthesis effort [8], enabling synthesis for much larger variants of such systems.

*Avoiding Nonblockingness Checks* Although the local supervisors in multilevel synthesis are nonblocking, the overall supervisor may not be. A global nonblockingness check can be used to guarantee that all local supervisors can reach a marked state at the same moment in time, but is often expensive, nullifying much of the gains obtained through applying multilevel synthesis. However, in a

dependency graph that encodes which plant components by means of requirements depend on state of other plant components to perform certain events, plant components do not give rise to blocking behavior if they are not part of an infinite path [9]. For certain systems, using such graphs, the global nonblockingness checks may be skipped entirely, or may be reduced to consider less subsystems.

*Symmetry Reduction* Real-life systems tend to contain a significant number of similar components, that for instance only differ by the instantiation of some of the parameters or their physical locations within the overall system. Such symmetries can be exploited to reduce the number of plant and requirement automata needed in the synthesis process, further reducing the synthesis effort [18].

#### **5 Case Studies and Applications**

*Rijkswaterstaat* Initially the collaboration with Rijkswaterstaat focused on generating control software with supervisory controller synthesis for bridges, waterway locks, and storm surge barriers. Notable case studies are the Algera complex, comprising a bascule bridge, a waterway lock and two storm surge barriers in the river Hollandse IJssel [22], and the Oisterwijksebaanbrug, a rotating bridge in Tilburg [23]. For the latter, a fault-tolerant controller was synthesized, from which PLC code was generated, which passed the regular site acceptance test.

Recent case studies target road tunnels, notably the Eerste Heinenoord tunnel [18] and the Swalmen tunnel [17], and roadside systems [31]. For the Swalmen tunnel, a digital twin, a 3D digital copy of a physical system, was conveniently constructed from the plant and requirement models. Combined with visualization, this allows simulation of the system in a setting close to real life.

*ASML* A prominent result of the collaboration with ASML is the use of the ESCET toolkit in a toolkit from another Eclipse Foundation open-source project, the Eclipse Logistic Specification and Analysis Toolkit (LSATTM) [26]. The LSAT toolkit is used at ASML to create fully calibrated models of subsystems of a wafer scanner, responsible for transporting wafers in and out of the scanner and performing preprocessing steps before the wafer is being exposed on the wafer stage subsystem. The LSAT toolkit exploits ESCET's supervisory controller synthesis to compute valid orderings of logistics activities, while maintaining the maximum freedom to subsequently perform scheduling on the synthesis result to compute a supervisor that optimizes productivity [20].

#### **6 Conclusions**

The ESCET project and toolkit support synthesis-based engineering to efficiently generate high-quality correct-by-construction supervisors. The toolkit is being applied to complex industrial systems in different domains. The project's open environment enables effective collaboration between industry, researchers and tool vendors. Owing to positive experiences with the ESCET toolkit, Rijkswaterstaat is seriously considering whether its document-based development process for control software could be adapted to one based on SBE with the ESCET toolkit.

### **7 Data-Availability Statement**

The artifact that supports this paper is available at Zenodo under identifier doi:10.5281/zenodo.7296616 [4]. It contains Eclipse ESCET v0.7 for Linux. However, the authors prefer that the Eclipse ESCET toolkit is downloaded directly from the Eclipse Foundation, where the latest version of the toolkit is available for multiple platforms.<sup>5</sup>

### **References**


<sup>5</sup> See https://eclipse.org/escet/download.html.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Combinatorial Optimization/Theorem Proving**

## New Core-Guided and Hitting Set Algorithms for Multi-Objective Combinatorial Optimization

Jo˜ao Cortes , Inˆes Lynce , and Vasco Manquinho()

INESC-ID - Instituto Superior T´ecnico, Universidade de Lisboa, Lisbon, Portugal {joao.o.cortes,ines.lynce,vasco.manquinho}@tecnico.ulisboa.pt

Abstract. In the last decade, numerous algorithms for single-objective Boolean optimization have been proposed that rely on the iterative usage of a highly effective Propositional Satisfiability (SAT) solver. But the use of SAT solvers in Multi-Objective Combinatorial Optimization (MOCO) algorithms is still scarce. Due to this shortage of efficient tools for MOCO, many real-world applications formulated as multi-objective are simplified to single-objective, using either a linear combination or a lexicographic ordering of the objective functions to optimize.

In this paper, we extend the state of the art of MOCO solvers with two novel unsatisfiability-based algorithms. The first is a core-guided MOCO solver. The second is a hitting set-based MOCO solver. Experimental results in several sets of benchmark instances show that our new unsatisfiability-based algorithms can outperform state-of-the-art SATbased algorithms for MOCO.

### 1 Introduction

Whenever facing a decision, there is often a set of objectives to optimize. For instance, when making a vacation plan with multiple destinations, one wants to minimize both the time spent in airports and the money spent on plane tickets. However, seldom can one obtain a solution that optimizes all objectives at once. It is usually the case that decreasing the value of an objective results in increasing the value of another. This occurs in many application domains [17,22,32].

In order to deal with multi-objective problems, we usually cast them into single-objective ones. For example, this can be achieved by defining a linear combination of the objective functions. Other option is to define a lexicographic order of the objectives [24], but this may result in unbalanced solutions where the first function is minimized while the remaining ones have a very high value.

In the multi-objective scenario, we are looking for Pareto-optimal solutions, i.e. all solutions for which decreasing the value of one objective function increases the value of another. After determining the set of all such solutions, known as Pareto front, one can select a representative subset and present it to the user [9].

Frameworks based on stochastic search have been developed to approximate the Pareto front of Multi-Objective Combinatorial Optimization (MOCO) problems [6,33]. Several algorithms were also proposed based on iterative calls to

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 55–73, 2023. https://doi.org/10.1007/978-3-031-30820-8 7

a satisfiability checker, such as the Opportunistic Improvement Algorithm [8], among others [16]. Additionally, the Guided-Improvement Algorithm (GIA) [26] is implemented in the optimization engine of Satisfiability Modulo Theories (SMT) solver Z3 for finding Pareto optimal solutions of SMT formulas. New algorithms have also been proposed based on the enumeration of Minimal Correction Subsets (MCSs) [30] or P-minimal models [28]. A common thread to these algorithms is that they follow a SAT-UNSAT approach. A path diversification method has also been proposed where unsatisfiable cores are identified in order to cut the path generation procedure [31]. More recently, Maximum Satisfiability (MaxSAT) approaches have been used for MOCO [12,10], but the proposed algorithms are limited to two objective functions.

In this paper, we propose two new algorithms for MOCO. The first algorithm is a core-guided approach that relies on encodings of the objective functions to effectively cut the search space in each SAT call. Additionally, we also propose a hitting set-based approach where the previous core-guided algorithm is used to enumerate a multi-objective hitting set. Note that these are the first algorithms for MOCO that take full advantage of unsatisfiable core identification over several objectives, as well as the first MOCO algorithm based on an hitting set approach, taking advantage of the duality between Pareto-MCSs [30] and unsatisfiable cores over several objectives. Experimental results show that the new algorithms proposed in this paper are complementary to the existing SAT-based algorithms for MOCO, thus extending the state-of-the-art tools for MOCO based on SAT technology.

The paper is organized as follows. Section 2 defines the MOCO problem and the standard notation used in the remainder of the paper. Next, Sections 3 and 4 describe the new core-guided and hitting set-based algorithms for MOCO. Experimental results and comparisons with other SAT-based algorithms are provided in Section 5. Finally, conclusions are presented in Section 6.

### 2 Preliminaries

We start with the definitions that fall in the SAT domain. Next, we introduce the definitions specific to solving the MOCO problem.

Definition 1 (Boolean Satisfiability problem (SAT)). Consider a set of Boolean variables V = {x1, . . . , xn}. A literal is either a variable x<sup>i</sup> ∈ V or its negation ¬x<sup>i</sup> ≡ x¯i. A clause is a set of literals. A Conjunctive Normal Form (CNF) formula φ is a set of clauses. A model ν is a set of literals, such that if x<sup>i</sup> ∈ ν, then x¯<sup>i</sup> 6∈ ν and vice versa.

The truth value of φ, denoted by ν(φ), is a function of ν, and is defined recursively by the following rules. First, the truth value of a literal is covered by ν(xi) = >, if x<sup>i</sup> ∈ ν, ν(xi) = ⊥, if x¯<sup>i</sup> ∈ ν and ν(¬xi) = ¬ν(xi). Secondly, a clause c is true iff it contains at least one literal assigned to true. Finally, formula φ is true iff it contains only true clauses,

$$\nu(\phi) \equiv \bigwedge\_{c \in \phi} \nu(c), \quad \nu(c) \equiv \bigvee\_{l \in c} \nu(l). \tag{1}$$

The model ν satisfies the formula φ iff ν(φ) is true. In that case, ν is (φ-)feasible. Given a CNF formula φ, the SAT problem is to decide if there is any model ν that satisfies it or prove that no such model exists.

Our algorithms require a SAT solver to be used as an Oracle. If the formula is satisfiable, then it returns a satisfiable assignment. Otherwise, the SAT solver returns with an explanation of unsatisfiability, called a core.

Definition 2 (Core κ). Given a CNF formula φ, we say a formula κ is an unsatisfiable core of φ iff κ ⊆ φ and κ ⊥.

Definition 3 (SAT solver). Let φ be a CNF formula and α a conjunction of unit clauses. We call φ the main formula and α the assumptions. A SAT solver solves the CNF<sup>1</sup> instance of the working formula ω = φ ∪ α, i.e. decides on the satisfiability of ω.

A query to the solver is denoted by φ-SAT(α). The value returned is a pair (ν, κ), containing a feasible model ν and a core of assumptions κ, i.e. a subset of the assumptions α contained in some core of ω. If the working formula ω is not satisfiable, ν does not exist, and the call returns (∅, •). If ω is satisfiable, the call returns (•, ∅).

Definition 4 (Relaxing/Tightening a formula). Given φ, a formula ψ is a relaxation of φ iff φ ψ. We also say ψ relaxes φ. Conversely, φ tightens ψ.

Next we review Pseudo-Boolean formulas and optimization and define the MOCO problem.

Definition 5 (Pseudo-Boolean function, clause, formula (PB)). To any linear function {0, 1} <sup>n</sup> → N, given by

$$g(\mathbf{z}) = g(x\_1 \dots x\_n) = \sum\_i w\_i x\_i \quad w\_i \in \mathbb{N}, \quad x\_i \in V,\tag{2}$$

we call an (integer linear) PB function. Expressions like g(x) ./ k, ./ ∈ {≤ , ≥, =}, are called PB clauses. A PB formula is a set of PB clauses. For some model ν : V → {0, 1}, let x be the Boolean tuple ν(V ) ≡ (ν(x1), . . . , ν(xn)). Given a formula φ, a model ν is said (φ-)feasible if it satisfies every clause in φ. The set of Boolean tuples Z(φ) = {x = ν(V ) ∈ {0, 1} n : ν(φ)} is called feasible space of the formula φ, and its elements x are called feasible points. Any subset of the feasible space is called a φ-feasible set.

Definition 6 (Pseudo-Boolean Optimization (PBO)). Let φ be a PB formula, and f be a PB function. Then, minimize the value of the objective f over the feasible space Z(φ) the formula φ. That is,

$$\begin{aligned} \mathit{find} & \text{arg}\min\_{\mathfrak{a}\in Z(\phi)} f. \end{aligned} \tag{3}$$

<sup>1</sup> We may use a PB formula (Definition 5) and assume it is translated to CNF.

Multi-objective optimization generalizes PBO and builds upon a criterion of comparison (or order) of tuples of numbers. The most celebrated one is called Pareto order or dominance.

Definition 7 (Pareto partial order (≺)). Let Y be some subset of N <sup>n</sup>. For any y, y <sup>0</sup> ∈ Y ,

$$\begin{aligned} y \preceq y' &\iff \forall i, y\_i \leq y'\_i, \\ y \prec y' &\iff y \preceq y' \land y \neq y'. \end{aligned}$$

We say y dominates y 0 iff y y 0 . We say y strictly-dominates y 0 iff y ≺ y 0 .

Given a tuple of objective functions sharing a common domain X, we can compare two elements x, x <sup>0</sup> ∈ X by comparing the corresponding tuples in the objective space.

Definition 8 (Pareto Dominance (≺)). Let F : X → Y ⊆ N <sup>n</sup> be a multiobjective function, mapping the decision space X into the objective space Y . For any x, x <sup>0</sup> ∈ X,

$$\begin{aligned} x \prec x' &\iff F(x) \prec F(x'),\\ x \preceq x' &\iff F(x) \preceq F(x'). \end{aligned}$$

We say x dominates x 0 iff x x 0 . We say x strictly-dominates x 0 iff x ≺ x 0 .

Contrary to the single-objective case, the consequence of this comparison criterion is that many different good solutions are mapped to different points in the objective space. Therefore, the solution to the problem is actually a set called Pareto front.

Definition 9 (Fronts). Given a a multi-objective function F : X → Y and a feasible space Z ⊆ X, the Pareto front of Z is a subset P ⊆ Z containing all elements that are not strictly-dominated,

$$P = \{ x \in Z : \nexists x' \in Z : x' \prec x \} \dots$$

We call img-front to the subset Y ⊆ Y which is the image of P by F,

$$\overline{Y} \equiv \text{img}\,\text{front}\_Z\, F = \{ \underline{y} \in Y : \exists x \in P : y = F(x) \}\,.$$

Finally, we call arg-front of Z, or simply front of Z, to any subset Z of the Pareto Front P which is mapped by F into Y in a one-to-one fashion

$$\overline{Z} = \text{front}\_Z F.$$

Definition 10 (Multi-Objective Combinatorial Optimization (MOCO)). Let F : X → Y ⊆ N <sup>n</sup> be a multi-objective PB function, mapping the decision space X ⊆ {0, 1} n into the objective space Y . Let Z ⊆ X be the feasible space of some PB formula φ, with variables in V . Then,

$$\text{find } \text{front}\_{Z(\phi)} \, F. \tag{4}$$

An instance will be denoted by the triple hφ, V, Fi.

Because the solutions of the problems are sets, bounds are now bound sets (Definition 13). In the single objective case, a bound is a value l such that ∀y = f(x) : l ≤ y, or equivalently, 6 ∃y = f(x) : l > y. This equivalence is broken by the generalization. Each of the previous defining properties of a lower bound gives rise to a differently flavoured comparison of sets (Definitions 11 and 12).

Definition 11 (Set coverage). Let A and B be subsets of some decision space X, equipped with a multi-objective function F. Then, A covers B iff every element of B is dominated by some element of A, i.e. ∀b ∈ B, ∃a ∈ A : a b, and A strictly covers B iff ∀b ∈ B, ∃a ∈ A : a ≺ b.

Definition 12 (Set non-inferiority). Let A and B be subsets of some decision space X, equipped with a multi-objective function F. Then A is non-inferior to B iff there is no element of B that strictly-dominates an element of A, ∀a ∈ A, b ∈ B : ¬(a b), and A is strictly non-inferior to B iff ∀a ∈ A, b ∈ B : ¬(a b).

Note that in the single objective case, non-inferiority and coverage are the same. The next definition correctly generalizes the notion of lower bound.

Definition 13 (Bound sets). L ⊆ X is a (strictly) lower bound set of Z ⊆ X iff L (strictly) covers and is (strictly) non-inferior to Z. If L is a lower bound set of Z, we say L Z. If it is a strictly lower bound set, we say L ≺ Z.

One way to generate a lower bound set of some Pareto front is to solve a related problem, where the formula is replaced by a relaxed version (Definition 4).

In our approach, we embed dominance relations into CNF formulas. We are interested in removing from the feasible space solutions that are dominated by some other known feasible solution. In order to do this, we make use of unary counters [3,13,14] that have been used to implement efficient PB satisfiability solvers.

Definition 14 (Unary Counter). Let f<sup>i</sup> : {0, 1} <sup>n</sup> → N be a PB function and set V be an ordered set of variables that parametrize the domain of fi,

$$V = \{x\_1, \ldots, x\_n\}, f\_i(\mathbf{x}) = f\_i(x\_1, \ldots, x\_n) \tag{5}$$

Consider the CNF formula <sup>φ</sup><sup>e</sup> with variables <sup>V</sup> <sup>∪</sup> <sup>O</sup>, where <sup>O</sup> <sup>∩</sup> <sup>V</sup> <sup>=</sup> <sup>∅</sup> and <sup>O</sup> contains one variable oi,k for each value k ∈ N : ∃x : k = fi(x). The elements of O are the order variables. We call the tuple D fi , V, O, <sup>φ</sup><sup>e</sup> E an unary counter of <sup>f</sup><sup>i</sup> iff all feasible models <sup>ν</sup> of <sup>φ</sup><sup>e</sup> satisfy

$$f\_i(\mathbf{z}) \ge k \implies o\_{i,k}, \quad \mathbf{z} = \nu(V). \tag{6}$$

#### 3 Core-Guided Algorithm

Although core-guided algorithms for Maximum Satisfiability were initially proposed more than a decade ago [7,23,21,2,1], there is no such algorithm for MOCO. Hence, our goal is to take advantage of unsatisfiable cores identified by a SAT solver in order to lazily expand the allowed search space.

Fig. 1: Illustration of a run of Core-Guided (Algorithm 1) in the objective space. The img-front is the set {1, 2, 3}. The fence bound λ gets updated at each iteration of the while loop at line 6, starting at A and ending at Ω. The arrows are guided by the core κ (line 19). The green shading represents the evolution of the fence. Darker regions have been fenced for longer. The blue regions are blocked by optimal points. Darker regions are dominated by more points. We will be done in 7 iterations. After verifying that A is not feasible, we are instructed by the cores k to move along the diagonal twice. We find point 1 fenced. Therefore the associated x is copied into I and the dominated region is blocked. We extend λ twice, and find point 2. After moving once more, we find part of the fence blocked, and the point branded with i is never generated. The next movement stations λ at Ω. Point 3 is found. The Oracle acknowledges we are done, by returning κ = ∅ (line 15): she knows that no movement of λ will extend I.

#### 3.1 Algorithm Description

Algorithm 1 presents the pseudo-code for an exact core-guided algorithm for MOCO. Figure 1 illustrates an abstract execution of the algorithm.

Let hφ, V, Fi be a MOCO instance. Recall that φ denotes the set of PB constraints, V is the set of variables and F denotes the list of m objective functions. First, the algorithm starts by building a working formula with the problem constraints and an unary counter for each objective function (lines 3- 4). This is accomplished by the call to EncodeOrder . Next, a vector λ of size m is initialized with the lower bound of each objective function (line 5), assumed to be 0 for simplicity.

At each iteration of the main loop, the assumptions α are assembled from order variables o, chosen with the value of λ in mind (line 7). The call to next(i, λ) 2 returns the next smallest value belonging to the image of the objective i. Given the semantics of the order variables oi,k (Definition 14), the tuple λ fences the search space, i.e. ν satisfies α only if the corresponding tuple x satisfies F(x) λ.

<sup>2</sup> May be replaced by λ<sup>i</sup> + 1.

Algorithm 1: Core-Guided MOCO solver


If the SAT call (line 10) returns a solution (i.e. ν 6= ∅), x is stored in and all dominated solutions are removed from I (line 11). Moreover, one can readily block all feasible solutions dominated by x using a single clause (line 13) [28].

Usually, there are several feasible fenced solutions. This occurs because the algorithm may increase multiple entries of λ at once. In any case, the inner while loop (lines 9-14) collects all such solutions.

When the working formula <sup>φ</sup><sup>e</sup> becomes unsatisfiable, the SAT solver provides a core κ. If κ is empty (line 15), then the unsatisfiability does not depend on the assumptions, i.e. it does not depend on temporary bounds imposed on the objective functions. At that point, we can conclude that no more solutions exist that are both satisfiable and not dominated by an element of I. As a result, the algorithm can safely terminate (line 16). Otherwise, the literals in κ denote a subset of the fence walls λ<sup>i</sup> that may be too restrictive, in the sense that unless we increment them (line 19) no new non-dominated solutions can be found.

#### 3.2 Algorithm Properties

Lemma 1. The img-front <sup>Y</sup> of <sup>I</sup> <sup>∪</sup> <sup>Z</sup>(φe)(Definition 9) is not changed by the inner loop (lines 9-14).

Proof. Consider some particular iteration of the internal loop. Line 11 and line <sup>13</sup> remove all elements of <sup>I</sup> <sup>∪</sup> <sup>Z</sup>(φe) that are dominated by the feasible point <sup>x</sup>. Line <sup>11</sup> filters the explicit set <sup>I</sup>, line <sup>13</sup> filters the implicit set <sup>Z</sup>(φe). Solutions that are strictly dominated by x cannot be mapped into an element of Y . The other solutions x 0 that are filtered out must attain the same objective vector attained by x, F(x 0 ) = F(x). Because x is also inserted at line 11, removing x 0 will not disturb Y .

Lemma 2. At the start of each iteration of the external loop (lines 6-19), every solution in I is optimal, and no two elements of I attain the same objective vector.

Proof. We prove this by contradiction. Assume that there is a non-optimal solution x ∈ I at the start of the external loop (line 6). In the first iteration, this does not occur because I is empty. Hence, this can only occur if the inner loop (lines 9-14) finishes with a non-optimal solution x ∈ I.

The inner loop (lines 9-14) enumerates solutions inside the fence defined by λ. We know that F(x) λ because it is inside the fence and the entries of λ never decrease. If x is non-optimal, then there must be an optimal solution x 0 such that F(x 0 ) ≺ F(x)( λ). Hence, x 0 is also inside the fence. As a result, x <sup>0</sup> must be found before the inner loop finishes, since at each iteration only dominated solutions are blocked (line 13). If x is found before x 0 , then x is excluded from I (line 11) when x 0 is found. Otherwise, if x 0 is found first, then x is not found by the SAT solver (blocked at line 13) because it is dominated by x 0 . Therefore, we cannot have a non-optimal solution x ∈ I at the end of the inner loop or at the start of each iteration of the external loop (lines 6-19). Furthermore, no two elements of I attain the same objective vector since when a solution x is found, all other solutions x 0 such that F(x) = F(x 0 ) are also blocked (line 13).

Lemma 2 establishes a weaker form of anytime optimality. The elements of the incumbent list I are not necessarily optimal at anytime, but they are optimal immediately after completing the inner loop. It is easy enough to make it anytime optimal. This could be achieved if the algorithm refrains from adding solutions directly to I in the inner loop and maintain a secondary list, where it stores the solutions that are still not necessarily optimal. This list takes the role of I inside the inner loop. After completing the inner loop, all elements of the secondary list are optimal, and can be safely transferred to the main list I.

#### Proposition 1. Algorithm 1 is sound.

Proof. If the algorithm returns, <sup>Z</sup>(φe∧α) = <sup>∅</sup>. Because <sup>κ</sup> is empty, no core of the unsatisfiable formula <sup>φ</sup><sup>e</sup> <sup>∧</sup> <sup>α</sup> intersects <sup>α</sup>, and <sup>φ</sup><sup>e</sup> is also unsatisfiable, <sup>Z</sup>(φe) = <sup>∅</sup>. Using Lemma 1 both at the end and at the start of the course of the algorithm, the img-front of <sup>I</sup> is the img-front of <sup>Z</sup>(φe), with <sup>φ</sup><sup>e</sup> given by line 4. Because the order variables are only restricted by the unary counter formula, the img-front of <sup>Z</sup>(φe) is the img-front of <sup>Z</sup>(φ). Therefore <sup>I</sup> must contain an arg-front of the problem. Using Lemma 2, every element of I is optimal, and there is no pair x, x <sup>0</sup> ∈ I such that F(x) = F(x 0 ). Therefore, I is an arg-front of the MOCO instance.

#### Proposition 2. Algorithm 1 is complete.

Proof. The inner loop will always come to fruition, because in the worst case it will generate every feasible solution dominated by the current λ once, and the feasible space is finite.

If the algorithm does not return for some particular instance, then κ is never empty. In that case, every iteration of the external loop starting at line 6 will increase at least one of the entries of λ. Eventually, one entry i must achieve the upper limit of f<sup>i</sup> , and the order variable retrieved by oi,λi+1 will not exist. Because the evolution of λ<sup>i</sup> is monotonous, the assumptions will contain at most m − 1 variables, from that point on. By the same token, the assumptions α will eventually be empty, and so must be κ ⊆ α, contradicting the assumption that the algorithm never terminates.

#### 4 Hitting Set-based Algorithm

This section proposes a MOCO solver based on the enumeration of hitting sets. The main idea is to compute a sequence of relaxations ψ of the formula φ, and solve the corresponding problems. The front T of the relaxed problem gets incrementally closer to the desired front Z, and will eventually reach it.

#### 4.1 Algorithm Description

Algorithm 2 contains the pseudo-code for our hitting set-based algorithm for MOCO. Figure 2 illustrates an abstract execution of the algorithm.

The algorithm starts by setting the relaxed formula ψ to empty (line 1). The main loop that starts at line 2 hones the relaxation until we get the desired result. At each iteration, we solve the current relaxed formula ψ at line 4. This is accomplished by using some MOCO solver. Because this amounts to computing a lower bound set, the Core-Guided algorithm, previously described, is a good choice for the task. We anticipate that it performs well for problems whose front is in the vicinity of the origin, given that by construction, the focus of its search is biased to that region. Notice that the first relaxation's arg-front is the set that contains the origin only (assuming all literals in the objective functions are positive). We expect that the first few relaxations will stay close to it.

Next, for each element x in T (the Pareto-front of ψ), we check the φfeasibility of ν : ν(V ) = x, using the assumptions mechanism, and return a (possibly empty) core of assumptions κ. The assumptions α<sup>x</sup> built at line 6 are a set of unit clauses whose polarity is inherited from ν,

$$\nu(x\_i) \implies x\_i \in \alpha, \quad \neg \nu(x\_i) \implies \neg x\_i \in \alpha. \tag{7}$$

Assuming φ is satisfiable, the returned core κ will be void iff α<sup>x</sup> ∧ φ is satisfiable. In this case, x corresponds to an optimal solution.


The diagnosis ∆ is central for the algorithm. Intuitively, it reports if and why the relaxed problem's solution is different from the true Pareto solution. We add every non-empty κ to the diagnosis ∆ (line 9). In the end, ∆ is empty iff every element of the relaxed front T is φ-feasible. At that point, we have found a φ-feasible lower bound set. All such sets are arg-fronts, and so the algorithm terminates (line 11). Otherwise, if ∆ is not empty, then the found cores are added to the relaxed formula ψ (line 13). This step ensures all tentative points produced in line 4 hit all previously found unsatisfiable cores, and that the algorithm advances in a monotonous fashion towards the solution.

#### 4.2 Algorithm Properties

Given a MOCO instance hφ, V, Fi, the formula φ encodes the feasible space Z implicitly, which in turn defines the desired front Z. This is a many to one correspondence, in the sense that there are many different values of ψ that encode the same Pareto front. It may happen that some of the counterpart instances are easier to solve than the original one, which begs the question: given φ, can we effectively find a simpler formula ψ with the same Pareto front? This is the motto of the proposed algorithm. It is done by iteratively honing a relaxed formula (Definition 4).

The main idea is to compute a sequence of relaxations that get incrementally tighter. In that case, the corresponding front T gets incrementally closer to the desired front Z,

$$\phi \qquad \implies \qquad \psi\_n \qquad \implies \qquad \dots \qquad \implies \qquad \psi\_1,\tag{8}$$

$$
\overline{Z} \qquad \qquad \succeq \qquad \overline{T}\_n \qquad \qquad \succeq \qquad \qquad \dots \qquad \qquad \succeq \qquad \qquad \overline{T}\_1, \tag{9}
$$

Fig. 2: Illustration of a run of the Hitting-Sets (Algorithm 2) in the objective space. The Pareto front is the set {1, 2, 3}, and the feasible solutions are marked by . For each iteration of the main while loop at line 2 we get a narrower lower bound T (line 4), culminating in the solution. We are done in 3 iterations, marked by A, B and . The shading represents the number of iterations whose freshly found points dominate the region. The lighter tone was painted by A, the darker one by all three. We start with the empty formula (line 1) and get A. Because the only point in A is not feasible, we tighten the relaxation (line 13). Iteration B generates one feasible point, 1, which is therefore optimal. Note that the region dominated by 1 can be pruned from now on. The other point is used to tighten the formula once more. Lastly, the lower bound contains the feasible points 2 and 3 in addition to 1, which was already found, and the algorithm stops.

where Z is one of the desired arg-fronts, and T<sup>i</sup> is an arg-front of ψ<sup>i</sup> .

Lemma 3. Consider some multi-objective function F : X → Y . Let Z, T be subsets of X, such that T ⊆ Z. Then, any arg-front of T is a lower bound set of any arg-front of Z (Definition 13), i.e. T ⊆ Z =⇒ T Z.

Lemma 3 is true because optimizing over a superset of some feasible space always returns a (non-strict) lower bound set. In a sense, the optimization can only be more extreme when applied to the superset. In particular, the feasible space of a relaxed formula is a superset of the original one. This is why the chain of relations in Equation (9) is correct.

Lemma 4. Let φ be a formula, Z ⊆ X be its feasible space and F : X → Y be some multi-objective function. Let L be a lower bound set of the Pareto front of Z. Then, any element x ∈ L that is feasible belongs to the Pareto front, L ∩ Z ⊆ P. If all elements x ∈ L are feasible, then L is an arg-front.

Lemma 4 implies that every lower bound set with only feasible elements must be itself an arg-front (this is an exact analogy with the single-objective case, where lower bound set is replaced by infimum and arg-front by arg-min.) By construction of the diagnosis ∆, this is equivalent to the condition used in Algorithm 2 to decide if it can terminate.

To ensure the sequence gets to Z in a finite number of steps, we need more than a string of relaxations. Each entry ψ <sup>0</sup> must be strictly tighter than the predecessor ψ.

Lemma 5. Consider Algorithm 2. Let ψ be the relaxed formula at some iteration, and ψ 0 be the relaxed formula at the next iteration. Then,


Proof. Each statement will be proven in turn.

The first is true because ψ ⊆ ψ 0 , by construction (line 13).

We prove the second by induction on the number of iterations. Initially, ψ is empty. Therefore, ψ relaxes any formula, in particular φ. Assume φ ψ for some iteration. Consider one of the clauses ¬κ added at line 13. We know that φ ∧ κ is unsatisfiable. Therefore, φ ∧ κ ⊥ =⇒ ¬(φ ∧ κ) > ⇐⇒ φ ¬κ. Given the assumption φ ψ, we get φ ψ ∧ ¬κ. Repeating the process for the other added clauses ¬κ<sup>i</sup> , we get φ ψ ∧ ¬κ<sup>1</sup> . . . ∧ ¬κ<sup>n</sup> ≡ ψ 0 .

Assume ψ 0 is a relaxation of ψ. Then, any ψ-feasible model ν is also ψ 0 feasible. We will prove there is at least one model that violates this. To start, note that it only makes sense to consider ψ 0 if there is some non-empty core κ in the diagnosis ∆; otherwise, the algorithm would have terminated before updating ψ into ψ 0 . Let κ be one element of ∆, generated at line 7 while ψ is current. Consider the Boolean tuple x ∈ T used to build the assumptions of the query that generated κ. Let ν : ν(V ) = x. The model ν is ψ-feasible, because it is part of the arg-front of ψ. The model ν satisfies κ because κ ⊆ α<sup>x</sup> and the way α<sup>x</sup> is constructed (line 6, Equation (7)). Therefore, ν does not satisfy ¬κ. Because ¬κ ⊆ ψ 0 , ν cannot satisfy ψ 0 , i.e. there is at least one ψ-feasible model that is not ψ 0 -feasible.

#### Proposition 3. Algorithm 2 is sound.

Proof. By Lemma 5, ψ relaxes φ and therefore T solves a relaxation of the original problem. By Lemma 3, it is a lower bound set of Z. When the algorithm returns, all elements of T are feasible. By Lemma 4, T must be an arg-front.

#### Proposition 4. Algorithm 2 is complete.

Proof. Assume Algorithm 2 never ends, implying T is never completely feasible (i.e. T \* Z). The number of relaxed feasible spaces T is finite. If Algorithm 2 does not end, it will enumerate all of them, never repeating any: at any iteration, the updated relaxed formula effectively blocks the reappearance of any feasible space seen before, because by Lemma 5 the updated value ψ 0 strictly tightens ψ. Then, this sequence is necessarily finite, and so must be the number of iterations. But in that case, Algorithm 2 must end, and we have a contradiction.

Consider the sequence whose entries are the value of F(T) computed at the beginning of each iteration of the main loop at line 2. The last element of this sequence is the solution. It may happen that for some i, the entries indexed by i and i + 1 are the same. Therefore, the sequence may include blocks of contiguous entries that share the same value. In the worst case scenario, there are many different arg-fronts for the same img-front, and the algorithm ends up enumerating all of them without any movement in the objective space. We expect the algorithm will be effective whenever a few of the relaxed problems are enough to get to the full solution. Otherwise, we can end up solving an exponential number of problems.

#### 5 Experimental Results

This section evaluates the performance of the algorithms proposed <sup>3</sup> in Sections 3 and 4. These algorithms are compared against other SAT-based MOCO solvers.

#### 5.1 Algorithms and Implementation

The Core-Guided algorithm proposed in Algorithm 1 uses the selection delimiter encoding [14] that has been shown to be more compact. Next, the selection delimiter encoding is extended to produce a unary encoding for each objective function. Additionally, an order encoding [29] is also used. We refer the interested reader to the literature for further details on this and other encodings [27,13,14,15]. Observe that any unary encoding from PB into CNF can be used.

The Hitting-Sets algorithm implements Algorithm 2. This hitting set-based approach uses Algorithm 1 to find the relaxed arg-front (line 4 of Algorithm 2).

The P-Minimal algorithm implements a SAT-UNSAT approach based on the enumeration of P-Minimal models [28]. This algorithm is implemented with the same PB to CNF encoding as the Core-Guided. Finally, the ParetoMCS is based on the stratified enumeration of Minimal Correction Subsets. We used the publicly available implementation of ParetoMCS<sup>4</sup> .

#### 5.2 Experimental Setup and Benchmark Sets

The following MOCO problems are considered: the multi-objective Development Assurance Level (DAL) Problem [5], the multi-objective Flying Tourist Problem (FTP) [22], the multi-objective Set Covering (SC) Problem [4,28] and the multiobjective Package Upgradeability (PU) Problem [11]. All instances are publicly available from previous research work or were generated from real-world data.

The DAL benchmark set (95 instances) encodes different levels of rigor in the development of a software or hardware component of an aircraft. The development assurance level defines the assurance activities aimed at eliminating

<sup>3</sup> Available at https://gitlab.inesc-id.pt/u001810/moco

<sup>4</sup> https://gitlab.ow2.org/sat4j/moco

design and coding errors that could affect the safety of an aircraft. The goal is to allocate the smallest DAL to functions to decrease the development costs [18].

The FTP benchmark set (129 instances) encodes the problem of a tourist that is searching for a flight travel route to visit n cities. The tourist defines her home city, the start and end of the route. She specifies the number of days d<sup>i</sup> to be spent on each city c<sup>i</sup> (1 ≤ i ≤ n) and also a time window for the complete trip. The problem is to find the route that minimizes the time spent on flights and the sum of the prices of the tickets<sup>5</sup> .

The SC benchmark set (60 instances) is a generalization of the set covering problem and was used in previous research work [28]. Let X be some ground set and A a cover of X. Each element in A has an associated cost tuple. The goal is to find a cover of X contained in A that Pareto-optimizes the overall cost.

The PU benchmark set (687 instances) were generated from the Package Upgradeability benchmarks [19] from the Mancoosi International Solver Competition [20]. The packup tool [25] was used to generate these benchmarks that contain between two and five objectives to optimize.

All results were obtained on an Intel Xeon Silver 4110 CPU @ 2.10GHz, with 64 GB of RAM. Each tool was executed on each instance with a time limit of 1 hour and 10 GB of RAM memory limit.

#### 5.3 Results and Analysis

Table 1 shows the number of instances whose Pareto front is completely enumerated, for each algorithm and benchmark set. Overall, the new unsatisfiabilitybased algorithms proposed in the paper completely solve more instances than the ParetoMCS and the P-Minimal algorithms. Note that the ParetoMCS is the one that solves fewer instances since it needs to enumerate all MCSs. The Core-Guided and Hitting-Sets converge faster to the Pareto front due to their UNSAT-SAT approach, while the P-Minimal is slower to converge. Overall, the Core-Guided algorithm is able to solve more instances than the other algorithms.

All tested algorithms are exact, but in some cases only an approximation of the Pareto front could be found within the time limit. However, the partial solution that is returned may still be valuable. In order to evaluate the quality of the approximations provided by each tool, we use the Hypervolume (HV) [34] indicator. HV is a metric that measures the volume of the objective space dominated by a set of points in the objective space, up to a given reference point. The coordinates of the reference point chosen are the maximal values of each objective. Regions that are not dominated by a reference front are discarded (we combined the results for each algorithm in order to produce the reference front). Larger values are preferred. A normalization procedure is carried out so that the values of HV are always between 0 and 1.

Figure 3 shows a cactus plot of the HV for all tools on each benchmark set. The P-Minimal provides better quality approximations of the Pareto front in the DAL (Figure 3a) and PU (Figure 3d) benchmarks since it uses a SAT-UNSAT

<sup>5</sup> Instances generated from flights in Europe between October and December 2019.


Table 1: Number of MOCO instances whose complete solution is found and certified per algorithm and benchmark set. Best results are in bold.

approach. Hence, it is faster to find an approximation to the Pareto front. Moreover, since some of the instances in these sets have higher optimal values on the objective functions, the Core-Guided and Hitting-Sets take many interactions until they reach the feasible part of the search space. Despite performing an unsatisfiability-based search, Core-Guided and Hitting-Sets algorithms are still able to provide good quality solutions since when these algorithms find solutions, these are in the Pareto front. Moreover, observe that even in these sets of instances, Core-Guided is still able to find all the Pareto front in more instances.

The ParetoMCS is able to provide good quality approximations in the FTP (Figure 3b) and PU (Figure 3d) benchmarks. Note that ParetoMCS does not use an explicit representation of the objective functions. The FTP instances have several large coefficients in the objective functions, but the representation used in Core-Guided is still effective for these instances. Observe that the performance of both algorithms is similar in the FTP dataset.

The Hitting-Sets finds poor approximations for all datasets. A common feature of this algorithm is the need to enumerate many hitting sets before being able to find feasible solutions. Hence, in several instances it is unable to provide good approximations. However, it is still able to prove optimality for more instances in the SC benchmark set than the P-Minimal algorithm.

Overall, the Core-Guided is the best performing algorithm being able to find the complete Pareto frontier in more instances. This is due to the fact that in many cases, it does not need to relax all variables to find solutions in the Pareto front. Moreover, when evaluating the quality of the approximations, it is still able to outperform the other approaches on the FTP and SC benchmark sets, despite applying an unsatisfiability-based approach.

#### 6 Conclusions

This paper proposes two new algorithms for Multi-Objective Combinatorial Optimization (MOCO). The first is a core-guided approach, while the second is

Fig. 3: Comparison of the HV results for each set of instances. Each series is sorted independently, smaller values first. Vertical scale is logarithmical.

based on the enumeration of hitting sets. These are the first SAT-based algorithms that fully integrate these strategies into a MOCO solver.

Experimental results on different sets of benchmark instances show that the new core-guided approach results in a robust algorithm that outperforms other SAT-based algorithms for MOCO. Using unary counters to express Pareto dominance in CNF proved to be an effective way to harness the power of SAT solvers in solving MOCO. The ability to express concepts related to dominance makes the algorithms conceptually simple.

Overall, the new algorithms are able to completely enumerate the Pareto front for more instances than previous SAT-based approaches. Moreover, despite following an unsatisfiability-based approach, the newly proposed algorithms are also able to provide good quality approximations even when they are unable to completely enumerate the Pareto front. Hence, these new unsatisfiability-based algorithms extend the state of the art for MOCO solvers by complementing and improving upon the existing tools based on queries to SAT Oracles.

Acknowledgements This work was supported by Portuguese national funds through FCT under projects UIDB/50021/2020, PTDC/CCI-COM/2156/2021, 2022.03537.PTDC and project ANI 045917 funded by FEDER and FCT.

### References


73


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Verified Reductions for Optimization

Alexander Bentkamp1,2() , Ramon Fernández Mir<sup>3</sup> , and Jeremy Avigad<sup>4</sup>

> <sup>1</sup> Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany bentkamp@gmail.com

<sup>2</sup> State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China

<sup>3</sup> School of Informatics, University of Edinburgh, Edinburgh, UK <sup>4</sup> Carnegie Mellon University, Pittsburgh, PA, USA

Abstract. Numerical and symbolic methods for optimization are used extensively in engineering, industry, and finance. Various methods are used to reduce problems of interest to ones that are amenable to solution by these methods. We develop a framework for designing and applying such reductions, using the Lean programming language and interactive proof assistant. Formal verification makes the process more reliable, and the availability of an interactive framework and ambient mathematical library provides a robust environment for constructing the reductions and reasoning about them.

Keywords: convex optimization · formal verification · interactive theorem proving · disciplined convex programming

### 1 Introduction

Optimization problems and constraint satisfaction problems are ubiquitous in engineering, industry, and finance. These include the problem of finding an element of R <sup>n</sup> satisfying a finite set of constraints or determining that the constraints are unsatisfiable; the problem of bounding the value of an objective function over a domain defined by such a set of constraints; and the problem of finding a value of the domain that maximizes (or minimizes) the value of an objective function. Linear programming, revolutionized by Dantzig's introduction of the simplex algorithm in 1947, deals with the case in which the constraints and objective function are linear. The development of interior point methods in the 1980s allows for the efficient solution of problems defined by convex constraints and objective functions, which gives rise to the field of convex programming [10,36,43]. Today there are numerous back-end solvers for convex optimization problems, including MOSEK [30], SeDuMi [41], and Gurobi [23]. They employ a variety of methods, each with its own particular strengths and weaknesses. (See [1, Section 1.2] for an overview.)

Using such software requires interpreting the problem one wants to solve in terms of one or more associated optimization problems. Often, this is straightforward; proving the safety of an engineered system might require showing that a certain quantity remains within specified bounds, and an industrial problem might require determining optimal or near-optimal allocation of certain resources. Other applications are less immediate. For example, proving an interesting mathematical theorem may require a lemma that bounds some quantity of interest (e.g. [4]). Once one has formulated the relevant optimization problems, one has to transform them into ones that the available software can solve, and one has to ensure that the conditions under which the software is designed to work correctly have been met. Mathematical knowledge and domain-specific expertise are often needed to transform a problem to match an efficient convex programming paradigm. A number of modeling packages then provide front ends that apply further transformations so that the resulting problem conforms to a back-end solver's input specification [15,20,26,17,42]. The transformed problem is sent to the back-end solver and the solver produces a response, which then has to be reinterpreted in terms of the original problem.

Our goal here is to develop ways of using formal methods to make the passage from an initial mathematical problem to the use of a back-end solver more efficient and reliable. Expressing a mathematical problem in a computational proof assistant provides clarity by endowing claims with a precise semantics, and having a formal library at hand enables users to draw on a body of mathematical facts and reasoning procedures. These make it possible to verify mathematical claims with respect to the primitives and rules of a formal axiomatic foundation, providing strong guarantees as to their correctness. Complete formalization places a high burden on practitioners and often imposes a standard that is higher than users want or need, but verification is not an all-or-nothing affair: users should have the freedom to decide which results they are willing to trust and which ones ought to be formally verified.

With respect to the use of optimization software, the soundness of the software itself is one possible concern. Checking the correctness of a solution to a satisfaction problem is easy in principle: one simply plugs the result into the constraints and checks that they hold. Verifying the correctness of a bounding problem or optimization problem is often almost as easy, in principle, since the results are often underwritten by the existence of suitable certificates that are output by the optimization tools. In practice, these tasks are made more difficult by the fact that floating point calculation can introduce numerical errors that bear on the correctness of the solution.

Here, instead, we focus on the task of manipulating a problem and reducing it to a form that a back-end solver can handle. Performing such transformations in a proof assistant offers strong guarantees that the results are correct and have the intended meaning, and it enables users to perform the transformations interactively or partially, and thus introspect and explore the results of individual transformation steps. Moreover, in constructing and reasoning about the transformations, users can take advantage of an ambient mathematical library, including a database of functions and their properties.

In Section 3, we describe the process that CVXPY and other systems use to transform optimization problems expressed in the disciplined convex program (DCP) framework to conic form problems that can be sent to solvers like MOSEK [30]. In Section 4, we explain how our implementation in the Lean programming language and proof assistant [33,32] augments that algorithm so that it at the same time produces a formal proof that the resulting reduction is correct. DCP relies on a library of basic atoms that serve as building blocks for reductions, and in Section 5, we explain how our implementation makes it possible to add new atoms in a verified way. In Section 6, we provide an example of the way that one can further leverage the power of an interactive theorem prover to justify the reduction of a problem that lies outside the DCP framework to one that lies within, using the mathematical library to verify its correctness. In Section 7, we describe our interface between Lean and an external solver, which transforms an exact symbolic representation of a problem into a floating point approximation. Related work is described in Section 8 and conclusions are presented in Section 9.

We have implemented these methods in a prototype, CvxLean.<sup>5</sup> We offer more information about the implementation in Section 9. A preliminary workshop paper [6] described our initial plans for this project and the reduction framework presented here in Section 2.

### 2 Optimization Problems and Reductions

The general structure of a minimization problem is expressed in Lean 4 as follows:

```
structure Minimization (D R : Type) :=
  (objFun : D → R)
  (constraints : D → Prop)
```
Here the data type D is the domain of the problem and R is the data type in which the objective function takes its values. The field objFun represents the objective function and constraints is a predicate on D, which, in Lean, is represented as a function from D to propositions: for every value a of the domain D, the proposition constraints a, which says that the constraints hold of a, is either true or false. The domain D is often R <sup>n</sup> or a space of matrices, but it can also be something more exotic, like a space of functions. The data type R is typically the real numbers, but in full generality it can be any type that supports an ordering. A maximization problem is represented as a minimization problem for the negation of the objective function.

A feasible point for the minimization problem p is an element point of D satisfying p.constraints. Lean's foundational framework allows us to package the data point with the condition that it satisfies those constraints:

```
structure FeasPoint {D R : Type} [Preorder R] (p : Minimization D R) :=
  (point : D)
  (feasibility : p.constraints point)
```
The curly and square brackets denote parameters that can generally be inferred automatically. A solution to the minimization problem p is a feasible point, denoted point, such that for every feasible point y the value of the objective function at point is smaller than or equal to the value at y.

<sup>5</sup> https://github.com/verified-optimization/CvxLean

```
structure Solution {D R : Type} [Preorder R] (p : Minimization D R) :=
  (point : D)
  (feasibility : p.constraints point)
  (optimality : ∀ y : FeasPoint p, p.objFun point ≤ p.objFun y.point)
```
Feasibility and bounding problems can also be expressed in these terms. If the objective function is constant (e.g. the constant zero function), a solution to the optimization problem is simply a feasible point. Given a domain, an objective function, and constraints, the value b is a strict lower bound on the value of the objective function over the domain if and only if the feasibility problem obtained by adding the inequality objFun x ≤ b to the constraints has no solution.

Lean 4 allows us to implement convenient syntax for defining optimization problems. For example, the following specifies the problem of maximizing <sup>√</sup> x − y subject to the constraints y = 2x − 3 and x <sup>2</sup> ≤ 2:

```
optimization (x y : R)
  maximize sqrt (x - y)
  subject to
    c1 : y = 2*x - 3
    c2 : x^2 ≤ 2
    c3 : 0 ≤ x - y
```
The third condition, c3, ensures that the objective function makes sense and is concave on the domain determined by the constraints. In some frameworks, like CVXPY, this constraint is seen as implicit in the use of the expression sqrt (x


In Section 6, we will consider the covariance estimation for Gaussian variables, which can be expressed as follows, for a tuple of sample values y:

```
optimization (R : Matrix (Fin n) (Fin n) R)
  maximize (
            Q
               i, gaussianPdf R (y i))
  subject to
    c_pos_def : R.posDef
```
Here Matrix (Fin n) (Fin n) R is Lean's representation of the data type of n×n matrices over the reals, gaussianPdf is the Gaussian probability density function defined in Section 6, and the constraint R.posDef specifies that R ranges over positive definite matrices.

If p and q are problems, a reduction from p to q is a function mapping any solution to q to a solution to p. The existence of such a reduction means that to solve p it suffices to solve q. If p is a feasibility problem, it means that the feasibility of q implies the feasibility of p, and, conversely, that the infeasibility of p implies the infeasibility of q. We can now easily describe what we are after: we are looking for a system that helps a user reduce a problem p to a problem q that can be solved by an external solver. (For a bounding problem q, the goal is to show that the constraints with the negated bound are infeasible by finding a

reduction from an infeasible problem p.) At the same time, we wish to verify the correctness of the reduction, either automatically or with user interaction. This will ensure that the results from the external solver really address the problem that the user is interested in solving.

This notion of a reduction is quite general, and is not restricted to any particular kind of constraint or objective function. In the sections that follow, we explain how the notion can be applied to convex programming.

### 3 Reduction to Conic Form

Disciplined Convex Programming (DCP) is a framework for writing constraints and objective functions in such a way that they can automatically be transformed into problems that can be handled by particular back-end solvers. It aims to be flexible enough to express optimization problems in a natural way but restrictive enough to ensure that problems can be transformed to meet the requirements of the solvers. To start with, the framework guarantees that expressions satisfy the relevant curvature constraints [1,21], assigning a role to each expression:

	- f is increasing in its ith argument and expr<sup>i</sup> is convex.
	- f is decreasing in its ith argument and expr<sup>i</sup> is concave.
	- expr<sup>i</sup> is affine.

An affine expression is both convex and concave. Some functions f come with side conditions on the range of arguments for which such curvature properties are valid; e.g. <sup>f</sup>(x) = <sup>√</sup> x is concave and increasing on the domain {x ∈ R | x ≥ 0}.

A minimization problem is amenable to the DCP reduction if, following the rules above, its objective function is convex and the expressions occurring in its constraints are concave or convex, depending on the type of constraint. For example, maximizing <sup>√</sup> x − y requires minimizing − √ x − y, and the DCP rules tell us that the latter is a convex function of x and y on the domain where <sup>x</sup> <sup>−</sup> <sup>y</sup> <sup>≥</sup> <sup>0</sup>, because <sup>x</sup> <sup>−</sup> <sup>y</sup> is affine, <sup>√</sup> · is concave and increasing in its argument, and negation is affine and decreasing in its argument.

CvxLean registers the properties of atomic functions f(¯a) in a library of atoms. Each such function f is registered with a formal representation expr<sup>f</sup> (¯a) using expressions, like x \* log x or log (det A), that can refer to arbitrary functions defined in Lean's library. The atom also registers the relevant properties of f. The curvature of f, curv<sup>f</sup> , has one of the values convex, concave, or affine, and the monotonicity of the function in each of its arguments is tagged as increasing, decreasing, or neither. CvxLean also allows the value auxiliary, which indicates an expression that serves as a fixed parameter in the sense that it is independent of the variables in the optimization problem. Atoms can also come with background conditions bconds<sup>f</sup> (¯a), which are independent of the domain variables, and variable conditions vconds<sup>f</sup> (¯a), which constrain the domain on which the properties hold. Notably, the atoms also include proofs of properties that are needed to justify the DCP reduction.

By storing additional information with each atom, a DCP framework can use the compositional representation of expressions to represent a problem in a form appropriate to a back-end solver. For example, solvers like MOSEK expect problems to be posed in a certain conic form [30]. To that end, CVXPY stores a graph implementation for each atomic function f, which is a representation of f as the solution to a conic optimization problem. By definition, the graph implementation of an atomic function f is an optimization problem in conic form, given by a list of variables v¯, an objective function obj<sup>f</sup> (¯x, v¯), and a list of constraints constr<sup>f</sup> (¯x, v¯), such that the optimal value of the objective under the constraints is equal to f(¯x) for all x¯ in the domain of validity. For example, for any <sup>x</sup> <sup>≥</sup> <sup>0</sup>, the concave function <sup>√</sup> x can be characterized as the maximum value of the objective function obj(x, t) = t satisfying the constraint constr(x, t) given by t <sup>2</sup> ≤ x. Once again, a notable feature of CvxLean is that that the atom comes equipped with a formal proof of this fact.

The idea is that we can reduce a problem to the required form by iteratively replacing each application of an atomic function by an equivalent characterization in terms of the graph implementation. For example, we can replace a subexpression <sup>√</sup> x − y by a new variable t and add the constraint t <sup>2</sup> ≤ x − y, provided that the form of the resulting problem ensures that, for any optimal solution to the constraints, <sup>t</sup> will actually be equal to <sup>√</sup> x − y. Given a wellformed DCP minimization problem, CvxLean must perform the reduction and construct a formal proof of the associated claims. In this section we describe the reduction, and in the next section we describe the proofs. A more formal description of both are given in an extended version of this paper [7].

Let e be a well-formed DCP expression. CvxLean associates to each such expression a tree T whose leaves are expressions that are affine with respect to the variables of the optimization problem. For example, this is the tree associated with the expression -sqrt (x - y):

Alternatively, we could use a single leaf for x - y. Denoting the variables of the optimization problem by y¯, we can recursively assign to each node n a subexpression oexprn(¯y) of e that corresponds to the subtree with root n. In the example above, the subexpressions are x, y, x - y, sqrt (x - y), and -sqrt (x - y). To each internal node, we assign a curvature, convex, concave, or affine, subject to the rules of DCP. An expression that is affine can be viewed as either convex or concave. Equalities and inequalities are also atoms; for example, e<sup>1</sup> ≤ e<sup>2</sup> describes a convex set if and only if e<sup>1</sup> is convex and e<sup>2</sup> is concave. A formalization of the DCP rules allows us to recursively construct formal proofs of these curvature claims, modulo the conditions and assumptions of the problem. We elaborate on this process in the next section.

Now consider a well-formed DCP minimization problem with objective function o and constraints c1, . . . , cn. We call these expressions the components of the problem. Recall the following example from the previous section, recast as a minimization problem:

```
optimization (x y : R)
  minimize -sqrt (x - y)
  subject to
    c1 : y = 2*x - 3
    c2 : x^2 ≤ 2
    c3 : 0 ≤ x - y
```
Here the components are -sqrt (x - y), y = 2\*x - 3, x^2 ≤ 2, and 0 ≤ x - y.

First, we assign to each component c an atom tree T<sup>c</sup> as described above. If y¯ are the variables of the original problem, the variables of the reduced problem are y¯∪z¯, where z¯ is a collection of variables consisting of a fresh set of variables for the graph implementation at each internal node of each tree, for those atoms whose graph implementations introduce new variables. To each node n of each atom tree, we assign an expression rexprn(¯y, z¯) in the language of the reduced problem representing the expression oexprn(¯y) in the original problem. At the leaves, rexprn(¯y, z¯) is the same as oexprn(¯y). At internal nodes we use the objective function of the corresponding atom's graph implementation, applied to the interpretation of the arguments. The objective of the reduced problem is the expression assigned to the root of To.

As far as the constraints of the reduced problem, recall that each internal node of the original problem corresponds to an atom, which has a graph implementation. The graph implementation, in turn, is given by a list of variables v¯, an objective function obj<sup>f</sup> (¯a, v¯), and a list of constraints constr<sup>f</sup> (¯a, v¯). These constraints, applied to the expressions representing the arguments, are part of the reduced problem. Moreover, the constraints of the original problem, expressed in terms of the reduced problem, are also constraints of the reduced problem, with one exception. Recall that atoms can impose conditions vconds<sup>f</sup> (¯a), which are assumed to be among the constraints of the original problem and to be implied by the graph implementation. For example, the condition 0 ≤ x is required to characterize <sup>√</sup> x as the maximum value of a value t satisfying t^2 ≤ x, but, conversely, the existence of a t satisfying t^2 ≤ x implies 0 ≤ x. So a constraint 0 ≤ x that is present in the original problem to justify the use of sqrt x can be dropped from the reduced problem.

In the example above, there is a tree corresponding to each of the components -sqrt (x - y), x^2 ≤ 2, 0 ≤ x - y, and y = 2\*x - 3. As n ranges over the nodes of these trees, oexprn(x, y) ranges over all the subexpressions of these components, namely, x, y, x - y, sqrt (x - y), -sqrt (x - y), x^2, 2, x^2 ≤ 2, and so on. The only atoms whose graph implementations introduce extra variables are the square root and the square. Thus, CvxLean introduces the variable t.0, corresponding to the expression sqrt (x - y), and the variable t.1, corresponding to the expression x^2. The values of rexprn(x, y, t0, t1) corresponding to some of the expressions above are as follows:

$$\begin{array}{l||l|l|l|l|l|l} \textbf{oexpr}\_{n}(x,y) & \| \textbf{x} - \textbf{y} \| & \textbf{sqrt} & \textbf{(x - y)} & \textbf{-sqrt} & \textbf{(x - y)} & \textbf{x2} \\ \hline \text{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\langle\mathbf{\texttt{\texttt{\texttt{\texttt{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\langle\mathbf{\$$

The constraints c1 and c2 of the original problem translate to cone constraints c1' and c2' on the new variables, the constraint c3 is implied by the graph representation of x^2, and the graph representations of sqrt (x - y) and x^2 become new cone constraints c4' and c5'. Thus the reduced problem is as follows:

```
optimization (x y t.0 t.1 : R)
 maximize t.0
 subject to
   c1' : zeroCone (2*x - 3 - y) -- 2*x - 3 - y = 0
   c2' : posOrthCone (2 - t.1) -- 2 - t.1 ≥ 0
   c4' : rotatedSoCone 0.5 (x - y) ![t.0] -- x - y ≥ t.0^2
   c5' : rotatedSoCone t.1 0.5 ![x] -- t.1 ≥ x^2
```
Here, ![t.0] and ![x] denote singleton vectors and the meaning of the cone constraints is annotated in the comments. For a description of the relevant conic forms, see the MOSEK modeling cookbook [31].

### 4 Verifying the Reduction

The reduction described in the previous section is essentially the same as the one carried out by CVXPY. The novelty of CvxLean is that it provides a formal justification that the reduction is correct. The goal of this section is to explain how we manage to construct a formal proof of that claim. In fact, given a problem P with an objective function f, CvxLean constructs a new problem Q with an objective g, together with the following additional pieces of data:


These conditions guarantee that if y is a solution to Q then ψ(y) is a solution to P, because for any feasible point x of P we have

$$f(\psi(y)) \le g(y) \le g(\varphi(x)) \le f(x).$$

This shows that ψ is a reduction of P to Q, and the argument with P and Q swapped shows that ϕ is a reduction of Q to P. Moreover, whenever y is a solution to Q, instantiating x to ψ(y) in the chain of inequalities implies f(ψ(y)) = g(y). Similarly, when x is a solution to P, we have g(ϕ(x)) = f(x). So the conditions above imply that P has a solution if and only if Q has a solution, and when they do, the minimum values of the objective functions coincide. Below, we will refer to the data (ϕ, ψ) as a strong equivalence between the two problems.

To construct and verify such a strong equivalence between the original problem and the result of applying the transformation described in Section 3, we need to store additional information with each atom. Specifically, for each atomic function f(¯a), that atom must provide solutions sol<sup>f</sup> (¯a) to the graph implementation variables v¯, as well as formal proofs of the following facts:

	- solution feasibility: sol<sup>f</sup> (¯a) satisfies the constraints constr<sup>f</sup> (¯a,sol<sup>f</sup> (¯a))
	- solution correctness: we have obj<sup>f</sup> (¯a,sol<sup>f</sup> (¯a)) = expr<sup>f</sup> (¯a) , where expr<sup>f</sup> (¯a) is the expression representing f.

Finally, as noted in the previous section, the graph implementation implies the conditions needed for the reduction. Under the assumptions on a¯ and a¯ 0 in the second case above, we also require a proof of vconds<sup>f</sup> (¯a 0 ). We refer to this as condition elimination.

For a concrete example, consider the atom for the concave function <sup>√</sup> a. In that case, vconds(a) is the requirement a ≥ 0, and expr(a), the Lean representation of the function, is given by Lean's sqrt function. The graph implementation adds a new variable v. The only constraint constr(a, v) is v <sup>2</sup> ≤ a, and the objective function is obj(a, v) = <sup>v</sup>. The solution function sol(a) returns <sup>√</sup> a when <sup>a</sup> is nonnegative and an arbitrary value otherwise. The atom for <sup>√</sup> · stores Lean proofs of all of the following:


More precisely, the atom stores the representation of the graph of the square root function as a cone constraint, and the properties above are expressed in those terms. These properties entail that sqrt is concave, but we do not need to prove concavity explicitly.

Let the variables y¯ range over the domain of the original problem, P, and let the variables y, ¯ z¯ be the augmented list of variables in the reduced problem, Q. We wish to construct a strong equivalence between P and Q. To that end, we need to define a forward map ϕ and a reverse map ψ. The definition of ψ is easy: we simply project each tuple y, ¯ z¯ to y¯. The definition of the forward map, ϕ, is more involved, since we have to map each tuple y¯ of values to an expanded tuple y, ¯ z¯. The values of y¯ remain unchanged, so the challenge is to define, for each new variable z, an expression interp<sup>z</sup> (¯y) to interpret it.

Recall that for each subexpression oexprn(¯y) in the original problem, corresponding to a node n, there is an expression rexprn(¯y, w¯) involving new variables from the reduced problem. Suppose a node n corresponds to an expression f(u1, . . . , un) in the original problem, and the graph implementation of f introduces new variables v¯. For each v<sup>j</sup> , we need to devise an interpretation interpv<sup>j</sup> (¯y). To start with, sol<sup>f</sup> provides a solution to v<sup>j</sup> in terms of the arguments u1, . . . , un. For each of these arguments, rexpr provides a representation in terms of the variables y¯ and other new variables. Composing these, we get an expression e(¯y, w1, . . . , w`) for v<sup>j</sup> in terms of the variables y¯ of the original problem and new variables w1, . . . , w`. Recursively, we find interpretations interpw<sup>k</sup> (¯y) of each wk, and define interpv<sup>j</sup> (¯y) to be e(¯y, interpw<sup>1</sup> (¯y), . . . , interpw` (¯y)). In other words, we read off the interpretation of each new variable of the reduced problem from the intended solution to the graph equation, which may, in turn, require the interpretation of other new variables that were previously introduced.

In the end, the forward map ϕ is the function that maps the variables y¯ in the original problem to the tuple (¯y, interpz<sup>1</sup> (¯y), . . . , interpz<sup>m</sup> (¯y)), where z1, . . . , z<sup>m</sup> are the new variables. To show that (ϕ, ψ) is a strong equivalence, we must show that for any feasible point y¯ of the original problem, ϕ(¯y) is a feasible point of the reduced problem. This follows from the solution correctness requirement above. We also need to show that if f(¯y) is the objective function of the original problem and g(¯y, z¯) is the objective function of the reduced problem, g(ϕ(¯y)) ≤ f(¯y). In fact, the solution correctness requirement enables us to prove the stronger property g(ϕ(¯y)) = f(¯y). Finally, we need to show that for any feasible point y, ¯ z¯ of the reduced problem, the tuple y¯ is a feasible point of the original problem and f(¯y) ≤ g(¯y, z¯). To do that, we recursively use the optimality requirement to show rexprn(¯y, z¯) ≥ oexprn(¯y) whenever the node n marks a convex expression or an affine expression in the role of a convex expression, and rexprn(¯y, z¯) ≤ oexprn(¯y) whenever the node n marks a concave expression or an affine expression in the role of a concave expression.

A proof that the maps ϕ and ψ constructed above form a strong equivalence can be found in the extended version of this paper [7], but it is helpful to work through the example from Section 3 to get a sense of what the proof means. For this example, the forward map is <sup>ϕ</sup>(x, y) = (x, y, <sup>√</sup> x − y, x<sup>2</sup> ) and the reverse map is ψ(x, y, t0, t1) = (x, y). Assuming that (x, y) is a solution to the original problem, the fact that ϕ(x, y) satisfies c1' follows from c1, the fact that it satisfies c2' follows from c2, the fact that it satisfies c4' and c5' follows from the fact that <sup>√</sup> x − y and x <sup>2</sup> are correct solutions to the graph implementation constraints. In this direction, g(ϕ(x, y)) = − √ x − y = f(x, y). In the other direction, assuming that (x, y, t0, t1) is a solution to the reduced problem, the fact that (x, y) satisfies c1 follows from c1', that fact that it satisfies c2 follows from c2' and c5', and the fact that is satisfies c3 follows from c4'. Here we have f(ψ(x, y, t0, t1)) = − √ x − y and g(x, y, t0, t1) = −t0, and the fact that the former is less than or equal to the latter follows from c4'.

### 5 Adding Atoms

One important advantage to using an interactive theorem prover as a basis for solving optimization problems is that it is possible to extend the atom library in a verified way. In a system like CVXPY, one declares a new atom with its graph implementation on the basis of one's background knowledge or a pen-and-paper proof that the graph implementation is correct and that the function described has the relevant properties over the specified domain. In CvxLean, we have implemented syntax with which any user can declare a new atom in Lean and provide formal proofs of these facts. The declaration can be made in any Lean file, and it becomes available in any file that imports that one as a dependency. Lean has a build system and package manager that handles dependencies on external repositories, allowing a community of users to share such mathematical and computational content.

For example, the declaration of the atom for the logarithm looks as follows:

```
declare_atom log [concave] (x : R)+ : log x :=
  conditions (cond : 0 < x)
  implementationVars (t : R)
  implementationObjective t
  implementationConstraints (c_exp : expCone t 1 x)
  solution (t := log x)
  solutionEqualsAtom by . . .
  feasibility (c_exp : by . . .)
  optimality by . . .
  conditionElimination (cond : by . . .)
```
The ellipses indicate places that are filled by formal proofs. Proof assistants like Lean allow users to write such proofs interactively in an environment that displays proof obligations, the local context, and error messages, all while the user types. For example, placing the cursor at the beginning of the optimality block displays the following goal:

x t : R c\_exp : expCone t 1 x ` ∀ (y : R), x ≤ y → t ≤ log y

In other words, given real values x and t and the relevant constraint in terms of the exponential cone, we need to prove that for every y ≥ x, we have t ≤ log(y).

For the example we present in the next section, we had to implement the log-determinant atom [10, Example 9.5], whose arguments consist of a natural number n and a matrix A ∈ R <sup>n</sup>×n. This function is represented in Lean by the atom expression exprlog-det = log (det A), where the parameter n is implicit in the type of A. The curvature is specified to be concave, the monotonicity in n is auxiliary because we do not support the occurrence of optimization variables in this argument, and the monotonicity in A is neither because the value of log(det A) is neither guaranteed to increase nor guaranteed to decrease as A increases. (The relevant order here on matrices is elementwise comparison.) The correctness of the reduction requires the assumption that A is positive definite. Following CVXPY, we used the following graph implementation:

$$\begin{aligned} \text{maximize } & \sum\_{i} t\_i \\ \text{over } & t \in \mathbb{R}^n, \ Y \in \mathbb{R}^{n \times n} \\ \text{subject to } & (t, 1, y) \in \text{expcone} \\ & \begin{pmatrix} D & Z \\ Z^T & A \end{pmatrix} \text{ positive semidefinite} \end{aligned}$$

Here y is the diagonal of Y ; Z is obtained from Y by setting all entries below the diagonal to 0; and D is obtained from Y by setting all entries off the diagonal to 0. Here, saying that the tuple (t, 1, y) is in the exponential cone means that e <sup>y</sup><sup>i</sup> ≥ t<sup>i</sup> for each i. Our implementation in CvxLean required proving that this graph implementation is correct. To do so, we formalized an argument in the MOSEK documentation.<sup>6</sup> This, in turn, required proving properties of the Schur complement, triangular matrices, Gram-Schmidt orthogonalization, and LDL factorization. Moreover, the argument uses the subadditivity of the determinant function, for which we followed an argument by Andreas Thom on MathOverflow.<sup>7</sup>

### 6 User-defined Reductions

An even more important advantage of using an interactive proof assistant as a framework for convex optimization is that, with enough work, users can carry out any reduction that can be expressed and justified in precise mathematical terms. As a simple example, DCP cannot handle an expression of the form exp(x)exp(y) in a problem, requiring us instead to write it as exp(x + y). But in CvxLean, we have the freedom to express the problem in the first form if we prefer to and then verify that the trivial reduction is justified:

```
reduction red/prob :
  optimization (x y : R)
    maximize x + y
    subject to
```
<sup>6</sup> https://docs.mosek.com/modeling-cookbook/sdo.html#log-determinant

<sup>7</sup> https://mathoverflow.net/questions/65424/determinant-of-sum-of-positiv e-definite-matrices/65430#65430

$$\begin{array}{rcl} \textbf{h} & : & (\textbf{exp } \textbf{x}) \ \* & (\textbf{exp } \textbf{y}) \ \leq \textbf{ 10 } := \textbf{by} \\ \textbf{conv\\_constr} & \Rightarrow \textbf{ rw} \ [\leftarrow \textbf{Recall} . \textbf{exp\\_add}] \end{array}$$

Here the expression rw [←Real.exp\_add] supplies the short formal proof that exp(x + y) can be replaced by exp(x)exp(y).

Of course, this functionality becomes more important as the reductions become more involved. As a more substantial example, we have implemented a reduction needed to solve the the covariance estimation problem for Gaussian variables [10, pp. 355]. In this problem, we are given N samples y1, . . . , y<sup>N</sup> ∈ R n drawn from a Gaussian distribution with zero mean and unknown covariance matrix R. We assume that the Gaussian distribution is nondegenerate, so R is positive definite and the distribution has density function

$$p\_R(y) = (2\pi)^{-n/2} \det(R)^{-1/2} \exp(-y^T R^{-1} y/2).$$

We want to estimate the covariance matrix R using maximum likelihood estimation, i.e., we want to find the covariance matrix that maximizes the likelihood of observing y1, . . . y<sup>N</sup> . The maximum likelihood estimate for R is the solution to the following problem:

$$\text{maximize } \prod\_{k=1}^{N} p\_R(y\_k) \text{ over } R \text{ subject to } R \text{ positive definite.}$$

As stated, this problem has a simple analytic solution, namely, the sample covariance of y1, . . . , yn, but the problem becomes more interesting when one adds additional constraints, for example, upper and lower matrix bounds on R, or constraints on the condition number of R (see [10]). We can easily reduce the problem to maximizing the logarithm of the objective function above, but that is not a concave function of R. It is, however, a concave function of S = R<sup>−</sup><sup>1</sup> , and common constraints on R translate to convex constraints on S. We can therefore reduce the problem above to the following:

$$\text{maximize } \log(\det(S)) - \sum\_{k=1}^{N} y\_k^T S y\_k \text{ over } S \text{ subject to } S \text{ positive definite},$$

possibly with additional constraints on S. We express the sum using the sample covariance Y = 1 N P<sup>N</sup> <sup>k</sup>=1 yky T k and the trace operator:

$$\begin{aligned} & \text{maximize } \log(\det(S)) - N \cdot \text{tr}(YS^T) \text{ over } S\\ & \text{subject to } S \text{ positive definite} \end{aligned}$$

The problem can then be solved using disciplined convex programming. The constraint that S is positive definite is eliminated while applying the graph implementation of log(det(S)).

We have formalized these facts in Lean and used them to justify the reduction. An example with an additional sparsity constraints on R can be found in CvxLean/Examples in our repository.

#### 7 Connecting Lean to a Conic Optimization Solver

Once a problem has been reduced to conic form, it can be sent to an external back-end solver. At this point, we must pass from the realm of precise symbolic representations and formal mathematical objects to the realm of numeric computation with floating point representations. We traverse our symbolic expressions, replacing functions on the reals from Lean's mathematical library with corresponding numeric functions on floats, for example associating the floating point exponential function Float.exp to the real exponential function Real.exp. Our implementation makes it easy to declare such associations with the following syntax: addRealToFloat : Real.exp := Float.exp.

This is one area where more verification is possible. We could use verified libraries for floating point arithmetic [2,9,19,44], we could use dual certificates to verify the results of the external solver, and we could carry out formal sensitivity analysis to manage and bound errors. Our current implementation is only designed to verify correctness up to the point where the problem is sent to the back-end solver, and to facilitate the last step, albeit in an unverified way.

We have implemented a solve command in CvxLean which takes a an optimization problem prob in DCP form and carries out the following steps:


Finally, the results are added to the Lean environment. In the following example, the command solve so1 results in the creation of new Lean objects so1.reduced, so1.status, so1.value, and so1.solution. The first of these represents the conicform problem that is sent to the back-end solver, while the remaining three comprise the resulting solution.

noncomputable def so1 :=

<sup>8</sup> https://docs.mosek.com/latest/rmosek/cbf-format.html

```
optimization (x y : R)
   maximize sqrt (x - y)
   subject to
     c1 : y = 2*x - 3
     c2 : x^2 ≤ 2
     c3 : 0 ≤ x - y
solve so1
#print so1.reduced -- shows the reduced problem
#eval so1.status -- "PRIMAL_AND_DUAL_FEASIBLE"
#eval so1.value -- 2.101003
#eval so1.solution -- (-1.414214, -5.828427)
```
### 8 Related Work

Our work builds on decades of research on convex optimization [10,36,39,43], and most directly on the CVX family and disciplined convex programming [15,17,20,21,42]. Other popular packages include Yalmip [26].

Formal methods have been used to solve bounding problems [18,38], constraint satisfaction problems [16], and optimization problems [25]. This literature is too broad to survey here, but [14] surveys some of the methods that are used in connection with the verification of cyber-physical systems. Proof assistants in particular have been used to verify bounds in various ways. Some approaches use certificates from numerical packages; Harrison [24] uses certificates from semidefinite programming in HOL Light, and Magron et al. [27] and Martin-Dorel and Roux [28] use similar certificates in Coq. Solovyev and Hales use a combination of symbolic and numeric methods in HOL Light [40]. Other approaches have focused on verifying symbolic and numeric algorithms instead. For example, Muñoz, Narkawicz, and Dutle [34] verify a decision procedure for univariate real arithmetic in PVS and Cordwell, Tan, and Platzer [13] verify another one in Isabelle. Narkawicz and Muñoz [35] have devised a verified numeric algorithm to find bounds and global optima. Cohen et al. [11,12] have developed a framework for verifying optimization algorithms using the ANSI/ISO C Specification Language (ACSL) [5].

Although the notion of a convex set has been formalized in a number of theorem provers, we do not know of any full development of convex analysis. The Isabelle [37] HOL-Analysis library includes properties of convex sets and functions, including Carathéodory's theorem on convex hulls, Radon's theorem, and Helly's theorem, as well as properties of convex sets and functions on normed spaces and Euclidean spaces. A theory of lower semicontinuous functions by Grechuk [22] in the Archive of Formal Proofs [8] includes properties of convex functions. Lean's mathlib [29] includes a number of fundamental results, including a formalization of the Riesz extension theorem by Kudryashov and Dupuis and a formalization of Jensen's inequality by Kudryashov. Allamigeon and Katz have formalized a theory of convex polyhedra in Coq with an eye towards applications to linear optimization [3]. We do not know of any project that has formalized the notion of a reduction between optimization problems.

### 9 Conclusions

We have argued that formal methods can bring additional reliability and interactive computational support to the practice of convex optimization. The success of our prototype shows that it is possible to carry out and verify reductions using a synergistic combination of automation and user interaction.

The implementation of CvxLean is currently spread between two versions of Lean [32,33]. Lean 3 has a formal library, mathlib [29], which comprises close to a million lines of code and covers substantial portions of algebra, linear algebra, topology, measure theory, and analysis. Lean 4 is a performant programming language as well as a proof assistant, but its language is not backward compatible with that of Lean 3. All of the substantial programming tasks described here have been carried out in Lean 4, but we rely on a binary translation of the Lean 3 library and some additional results proved there. This arrangement is not ideal, but a source-level port of the Lean 3 library is already underway, and we expect to move the development entirely to Lean 4 in the near future.

There is still a lot to do. We have implemented and verified all the atoms needed for the examples presented in this paper, but these are still only a fraction of the atoms that are found in CVXPY. The DCP transformation currently leaves any side conditions that it cannot prove for the user to fill in, and specialpurpose tactics, i.e. small-scale automation, could help dispel proof obligations like monotonicity. Textbooks often provide standard methods and tricks for carrying out reductions (e.g. [10, Section 4.1.3]), and these should also be supported by tactics in CvxLean. Our project, as well as Lean's library, would benefit from more formal definitions and theorems in convex analysis and optimization. We need to implement more efficient means of extracting numeric values for the backend solver, and it would be nice to verify more of the numeric computations and claims. Finally, and most importantly, we need to work out more examples like the ones presented here to ensure that the system is robust and flexible enough to join the ranks of conventional optimization systems like CVXPY.

Acknowledgements Seulkee Baek did some preliminary experiments on connecting Lean 3 to external optimization solvers. Mario Carneiro and Gabriel Ebner advised us on how to formalize optimization problems and on Lean 4 metaprogramming. Steven Diamond helped us understand the world of convex optimization. We also had helpful discussions with Geir Dullerud, Paul Jackson, Florian Jarre, John Miller, Balasubramanian Narasimhan, Ivan Papusha, and Ufuk Topcu. Diamond, Jackson, and Parth Nobel provided helpful feedback on a draft of this paper. This work has been partially supported by the Hoskinson Center for Formal Mathematics at Carnegie Mellon University. Bentkamp has received funding from a Chinese Academy of Sciences President's International Fellowship for Postdoctoral Researchers (grant No. 2021PT0015). We thank the anonymous reviewers for their corrections and suggestions.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Specifying and Verifying Higher-order Rust Iterators

Xavier Denis and Jacques-Henri Jourdan()

Université Paris-Saclay, CNRS, ENS Paris-Saclay, Laboratoire Méthodes Formelles, 91190, Gif-sur-Yvette, France jacques-henri.jourdan@cnrs.fr

Abstract. In Rust, programs are often written using iterators, but these pose problems for verification: they are non-deterministic, infinite, and often higher-order, effectful and built using adapters. We present a general framework for specifying and reasoning with Rust iterators in firstorder logic. Our approach is capable of addressing the challenges set out above, which we demonstrate by verifying real Rust iterators, including a higher-order, effectful Map. Using the Creusot verification platform, we evaluate our framework on clients of iterators, showing it leads to efficient verification of complex functional properties.

Keywords: Rust · Deductive verification · Iterators · Closures

### 1 Introduction

The Rust language aims to empower systems software programmers by offering them safe and powerful linguistic abstractions to solve their problems. The most notorious of these abstractions, Rust's borrowing mechanism, enables safe usage of pointers without a garbage collector or performance penalty. A close second is perhaps Rust's iterator system, through which Rust provides composable mechanisms to express the traversal and modification of collections. Iterators also underlie Rust's for loop syntax, and are thus the primary manner Rust developers write loops or interact with data structures. It is therefore essential for a verification tool for Rust to provide good support for iterators.

Rust iterators generate sequences of values. Most importantly, they are objects providing a method fn next(&mut self) -> Option<Self::Item>. This method takes a mutable reference (&mut self) to the iterator, allowing it to change its internal state, and optionally returns a value of type Self::Item, the type of the values generated by the iterator. If, instead of returning such a value, the iterator returns None, it means iteration has finished for now, though it may resume again later. Rust's for loops are just syntactic sugar for repeatedly calling next at the beginning of each iteration, until such a call returns None. For example, the following two pieces of code present a Rust loop for iterating over integers between 0 (included) and n (excluded), using a range iterator:

```
for i in 0..n { <body> }
                                     let mut iter = 0..n;
                                     loop { match iter.next() {
                                       None => break,
                                       Some(i) => <body>
                                     } }
```
The piece of code on the left-hand side uses an idiomatic for loop, while the other shows its desugared version.

Iterators present unique challenges for verification tools: indeed, because the use of iterators is pervasive in Rust, it is necessary to allow verification of code using iterators with as little interaction as possible. In particular, most common patterns such as iterating over integers in a given range or reading the elements of a vector should not need any annotation other than the loop invariants the user would write if not using iterators. On the other hand, Rust's iterator library is complex, with many features representing as many challenges for verification: iterators can be built from various data structures and modified through iterator adapters, which make it possible to create iterators from simpler ones, by, e.g., skipping the first few elements or applying a given function to each of the elements.

Consider the example below:

```
1 let mut cnt = 0;
```

```
2 let w = vec![1,2,3].iter().map(|x|{cnt += 1; x + 1}).collect();
3 assert_eq!(w, vec![2,3,4]); assert_eq!(cnt, 3);
```
On line 2, quite a lot happens at once. First, we produce an iterator over the elements of the vector vec![1,2,3] with the syntax .iter(), which we transform through a call to map. The method map is an iterator adapter : it returns a new iterator that calls the given closure on each of the elements generated by the underlying iterator, and forwards the value returned by the closure. Interestingly, the closure we pass to map captures mutable state: it modifies the variable cnt. Finally, the method collect gathers the elements generated into a new vector w.

We aim at requiring only lightweight annotations for verifying this kind of code: the appeal of iterator chains like on line 2 are the ergonomics, they are compact and highly-readable. For verification of iterator-based code to be successful, it must preserve this ergonomics. However, despite its apparent simplicity, this piece of code is challenging to verify: it combines higher-order functions and mutable state, uses potentially overflowing integers, and assertions on line 3 check full functional behavior.

More generally, to support iterators, a verification tool for Rust needs to provide a specification scheme that both provides good ergonomics and overcomes the following technical challenges:


#### 1.1 Contributions

In order to reach this goal, we propose a new specification scheme for iterators in Rust. Our contributions can be summarized as follows:


an example. This provides a way to verify the functional correctness of programs using higher-order iterators, while requiring lightweight annotations.

– We provide a freely available<sup>1</sup> implementation of our proposal in Creusot [4]. This tool is a state-of-the-art verification platform for safe Rust code, allowing users to verify programs by adding contracts to their functions. This implementation extends Creusot's handling of for loops to benefit from structural invariants provided by the specification of iterators. We evaluate it in Section 7 on several benchmarks.

### 2 Specifications in Rust Programs

Before explaining the specification of iterators, we introduce the style of specification we use in this paper. One important aspect of specifications of imperative programs is their memory model, that is the way they handle pointers and mutations performed through them. Following previous work [7, 8, 4], we choose to leverage the non-aliasing guarantees of Rust's type system. Because of the non-aliasing guarantees, a given memory location can be mutated through at most one reference at a given point in time, excluding all "spooky actions at a distance" that are customary with pointer aliasing. Therefore, it is possible to give a mutable value semantics [13] to Rust programs, meaning that, even though Rust programs can perform mutation of memory, they can be reasoned about in a purely applicative manner. As a result, the Rust type Box<T> of heap-allocated pointers, and the Rust type &T of read-only references are simply modeled by wrappers over values of type T in our specifications. As shown in previous work [4, 7, 8], this interpretation of Rust programs is key to verifying complex Rust programs, because it avoids the use of any kind of separation logic or dynamic frames, which are challenging to automate.

The handling of mutable references &mut T requires caution. Such references represent the temporary borrow of ownership of a memory location, so that mutations through such a reference will be observed by the initial owner once the borrow ends. To correctly model the propagation of mutations from the mutable reference to the borrowed variable, this style of specification models a mutable reference r: &mut T as a pair of a current value \*r of type T (representing the current value pointed to by the reference) and of a prophecy ^r, representing the value the reference will point to when the borrow ends.

This prophetic interpretation makes it possible to give precise specifications to functions that manipulate mutable references. For example, the function push adding a new element at the end of a vector in place can be specified as follows:

```
#[ensures(@^self == (@*self).concat(Seq::singleton(v)))]
fn push(&mut self, v: T);
```
Here, we use the operator @ to refer to the model of a vector, i.e., the mathematical sequence of its elements. The postcondition thus ensures that the content of

<sup>1</sup> https://github.com/xldenis/creusot/

the final vector pointed to by self, denoted by ^self, is modeled by the sequence of elements of the initial vector \*self, concatenated with the new element v.

We sometimes use purely mathematical functions and predicates, annotated with the #[logic] and #[predicate] attributes.

We use Rust traits to give composable specifications to iterators. They are analogous to Haskell's typeclasses, enabling ad-hoc polymorphism. For example, an order relation can be specified as a trait containing both a mathematical order relation with its laws (reflexivity, antisymmetry and transitivity), and a program function specified as returning the value prescribed by the logical predicate.

To aid in specification and verification of code, we use ghost code, code which exists only during verification and has no influence on runtime behavior.

#### 3 Reasoning on Iteration

In this section, we present the general mechanism we use to specify iterators (Section 3.1), and how this kind of specification is used in a for loop (Section 3.2).

#### 3.1 Specifying Iterators

In Rust, the mechanism of iterators is captured by a trait named Iterator, whose simplified definition can be given as:

```
trait Iterator { type Item; fn next(&mut self) -> Option<Self::Item>; }
```
This trait describes the interface an iterator should implement: an iterator should give a type Item of generated elements, and should implement a method next which optionally returns the next generated element, and possibly mutates in place the internal state of the iterator through the mutable reference &mut self.

As can be seen in Figure 1, we extend<sup>2</sup> the iterator trait with the purely logical predicates produces and completed. We require that any implementation of this trait satisfies the laws produces\_refl and produces\_trans: such laws are lemmas stated as specifications of purely logical functions (i.e., the preconditions should imply the postconditions). The next method is then specified thanks to the two predicates. Any implementation of the Iterator trait needs to give a logical definition of produces and completed predicates, prove the laws, give a program definition for next and finally prove that it satisfies its specification.

Iterators are specified as state machines: a value of an iterator type is seen as a state; produces(a, s, b) defines the transition relation (noted a <sup>s</sup> b), and the predicate completed (noted completed(·)) give the set of final states. The completed predicate takes a mutable reference &mut self, which allows us to

<sup>2</sup> In our implementation, to keep better compatibility with existing Rust code, we choose to define the iterator specification as a sub-trait of the Iterator trait from Rust's standard library, and to give the specification of next using Creusot's extern\_spec! mechanism. For simplicity, we present it here as a unique trait: the main idea of the specification is the same.

```
1 trait Iterator {
2 type Item;
3 #[predicate] fn completed(&mut self) -> bool;
4 #[predicate] fn produces(self, visited: Seq<Self::Item>, _: Self)
5 -> bool;
6 #[law] // I.e., ∀ a, a
                           ε a
7 #[ensures(a.produces(Seq::EMPTY, a))]
8 fn produces_refl(a: Self);
9
10 #[law] // I.e., ∀ a b c, a v b ∧ b
                                   w c ⇒ a
                                          v·w c
11 #[requires(a.produces(ab, b) && b.produces(bc, c))]
12 #[ensures(a.produces(ab.concat(bc), c))]
13 fn produces_trans(a: Self, ab: Seq<Self::Item>,
14 b: Self, bc: Seq<Self::Item>, c: Self);
15
16 #[ensures(match result {
17 None => self.completed(),
18 Some(v) => (*self).produces(Seq::singleton(v), ^self)})]
19 fn next(&mut self) -> Option<Self::Item>;
20 }
```
Fig. 1. Iterator trait extended with specification.

specify mutations that happen when an iterator returns None<sup>3</sup> . This added expressivity in the specification allows us to express properties of unfused iterators which may intermittently produce None during iteration. The produces transition relation is annotated with sequences of generated values rather than with unique values so that a user can reason about interesting properties of sequences as a whole rather than directly reasoning about the notion of transitive closure, which automated solvers do not handle well. The price to pay is the laws of reflexivity and transitivity which the implementers have to prove.

#### 3.2 Structural Invariant of for Loops

Part of the appeal of for loops is the structure they provide over the looping process. When a programmer sees a for, they can conclude that the body will be executed once for each element in the iterator. Unlike with while loops, it is not possible to decrement the loop index or otherwise perform unpredictable looping patterns. This informal reasoning can be formalized as a loop invariant, provided structurally by the for loop itself. The iterator at the i-th iteration is the result of calling next exactly i times on some initial state. In our formalism, given an initial iterator state initial and a current iterator state iter, we can state this

<sup>3</sup> The predicate completed does not perform any side effects; it should rather be seen as a two-state predicate.

invariant as ∃ p, initial <sup>p</sup> iter. This invariant holds for any for loop over any iterator: it can be derived from the laws produces\_refl and produces\_trans.

When using our extension to Creusot, every for loop benefits from this structural invariant: we change the way these loops are desugared into the more primitive loop construct, by adding ghost variables init\_iter and produced and the new invariant init\_iter.produces(produced, iter). More precisely, a simple for loop for x in iter {<body>} is desugared into:

```
let init_iter = ghost! { iter };
let mut produced = ghost! { Seq::EMPTY };
#[invariant(structural, init_iter.produces(produced, iter))]
loop { match iter.next() {
  None => break,
  Some(x) => {
    produced = ghost! { produced.concat(Seq::singleton(x)) };
    <body> },
} }
```
Interestingly, the ghost variable produced can be referred to in a user invariant to relate the state of the loop with the iteration state. In the piece of code in Figure 2, we use a variable count to count the number of elements generated by an iterator, and use such an invariant to verify its intended meaning.

```
let mut count = 0;
#[invariant(count_is_n, @count == produced.len())]
for i in 0..n { count += 1; assert!(0 <= i && i < n); }
assert!(n < 0 || count == n);
```
Fig. 2. A simple for loop using ranges.

#### 4 Examples of Specifications of Simple Iterators

In Section 3, we have presented a general framework to specify iterators and use them in for loops. In this section, we present several simple examples of iterators defined in this framework.

#### 4.1 The Range Iterator

We start with a simple Range iterator, whose purpose is to iterate over the integers in a given range. The notation a..b used idiomatically in Rust is a syntactic sugar for this kind of iterators. The original definition from the Rust standard library is generic over the type of integers used, but, for the sake of simplicity, we use a monomorphic version here:

struct Range { start: usize, end: usize }

If self.start ≥ self.end, the next method returns None. Otherwise, it increments self.start and returns the initial value of Some(self.start). Note that the upper bound of the range, end, is excluded in the iteration.

In order to instantiate our iterator specification scheme with Range, we use the produces and completed predicates defined by:

$$\begin{aligned} r \stackrel{v}{\leadsto} r' &\quad \triangleq \quad |v| = r'. \mathsf{start} - r. \mathsf{start} \wedge r. \mathsf{end} = r'. \mathsf{end} \\ &\quad \wedge |v| > 0 \Rightarrow r'. \mathsf{start} \leq r'. \mathsf{end} \\ &\quad \wedge \forall i \in [0, |v| - 1], v[i] = r. \mathsf{start} + i \\ \textit{completeness}(r) &\quad \triangleq \quad \star r = \,^r r \wedge (\ast r). \mathsf{end} \geq (\ast r). \mathsf{start} \end{aligned}$$

Transitivity and reflexivity are easily verified.

Rust's standard library also contains ranges whose upper bound is included rather than excluded, and ranges without an upper bound. They can all be specified using similar techniques.

Note that with these definitions, the structural invariant of for loops directly implies that the loop index (the last produced value) is in the range. In addition, if the range is non-empty, one can deduce that the last iterated value is end − 1. These two properties usually require an additional invariant if the loop is encoded using the while construct. For an illustration consider Figure 2.

### 4.2 IterMut: Mutating Iteration Over a Vector

Our approach to iterators can be used to iterate over elements of a vector. But instead of presenting the simple case of a read-only vector iterator, we study a more general iterator, IterMut, permitting to both read and write vector elements while iterating; the simpler case of the read-only iterator uses the same ideas.

This iterator produces mutable references for each element of a vector in turn. The state of this iterator is a mutable reference to the slice (i.e., a fragment of a vector) of elements that remain to be iterated:

```
struct IterMut<'a, T> { inner: &'a mut [T] }
```
To define the production relation of IterMut, we use a helper function tr, which transposes a mutable reference to a slice into a sequence of mutable references to its elements. Its defining property is:

$$|tr(s)| = |s| \ \land \ \forall i \in [0, |s| - 1], \ tr(\ast s)[i] = \ast s[i] \ \land \ tr(\stackrel{\sim}{\ast}s)[i] = \stackrel{\sim}{\ast}s[i]$$

With the help of tr, the produces and completed relations of IterMut are simple to express:

$$\begin{array}{rcl} it \stackrel{v}{\leadsto} it' & \triangleq & tr(it.\texttt{inner}) = v \cdot tr(it'.\texttt{inner})\\ completed(it) & \triangleq & \ast r = \uparrow r \land |\star r| = 0 \end{array}$$

It means that the iterator it produces a sequence of mutable references, which must be the initial segment of tr(it.inner), into a final state it<sup>0</sup> such that tr(it.inner) is the sequence of mutable references that are left to be generated. Such an iterator is completed when the inner slice is empty.

This compact specification is enough to reason about mutating through the returned pointers as in the following example:

```
#[invariant(all_zero, forall<i: Int> 0 <= i && i < produced.len()
                                  ==> @^produced[i] == 0)]
for x in v.iter_mut() { *x = 0; }
assert!{ forall<i: Int> 0 <= i && i < (@v).len() ==> @(@^v)[i] == 0 }
```
That is, we are able to prove with a simple loop invariant that this loop sets to 0 all the elements of the vector.

The reasoning that occurs to prove this program is as follows. First, at the end of a loop iteration, we know that the final value of the borrow x is equal to 0 since we have just written 0 and this value will not change since x goes out of scope. Together with the invariant of the preceding iteration, this is enough to prove that the invariant is maintained. Second, after the loop has executed, the final iterator state is empty, so we know produced contains the complete sequence of borrows to elements of v. But, thanks to the loop invariant, the prophetic value of each of these borrows is 0. So we can deduce that the final content of v is a sequence of zeros.

#### 4.3 Iterator Adapters

Because all iterators implement the same trait Iterator which gives them a specification, we can easily build adapters which wrap and transform the behavior of an iterator.

It is important to note that, following Rust's standard library, these adapters are generic over the type of the underlying iterator; individual values of a type cannot have different predicates. While the verification tool cannot know the concrete definitions of produces or completed for the wrapped iterator, it knows it must satisfy the Iterator trait interface.

The simplest example is Take<I> (where I is another iterator), which truncates an iterator to produce at most n elements. The state of Take<I> is a record with two fields: a counter n for the remaining elements to take and an iterator iter to take from. The specification predicates of Take<I> are defined as follows:

$$\begin{array}{rcl} it \stackrel{v}{\sim} it' & \triangleq & it.\mathbf{i} \mathbf{i} \mathbf{er} \stackrel{v}{\sim} it'.\mathbf{i} \mathbf{ier} \wedge it.\mathbf{n} = it'.\mathbf{n} + |v|\\ \stackrel{\scriptstyle \clubsuit}{\sim} & (\ast it).\mathbf{n} = 0 \wedge \ast it = \uparrow it\\ & \vee (\ast it).\mathbf{n} > 0 \wedge (\ast it).\mathbf{n} = (\uparrow it).\mathbf{n} + 1 \wedge completed(\mathbf{i}.\mathbf{i}.\mathbf{ter}) \end{array}$$

The subtle definition here is that of completed(it): if the counter is 0, then next does nothing. But, following Rust's implementation, if the counter is not 0, then it is first decremented even if the call to the underlying iterator returns None.

Again, when instantiated to a specific underlying iterator type, we can substitute the definitions of ( ) and completed(−) for the underlying iterator, to get a concrete definition of these predicates for Take<I>, which are easier to handle by automated solvers.

Another adapter is Skip<I>, whose goal is to skip the first n elements of an iterator. Similarly to Take<I>, the state is a record with two fields: a number n of elements to skip and an underlying iterator iter.

The relation of Skip<I> is defined as follows:

$$\begin{array}{rcl} it \stackrel{v}{\leadsto} it' & \stackrel{\Delta}{=} & v = \varepsilon \land it = it'\\ & \lor \newline i'. \mathbf{n} = 0 \land |v| > 0 \land \exists \, w, \, |w| = it. \mathbf{n} \land it. \mathbf{i} \mathbf{t} \mathbf{e} \stackrel{w \cdot v}{\leadsto} it'. \mathbf{i} \mathbf{t} \mathbf{e} \end{array}$$

The first disjunct is needed to ensure reflexivity of ( ). The second disjunct describes what happens after a non-empty sequence of calls. If we produced some sequence of elements v, then we must have been able to skip n elements first, which we existentially quantify over.

If the Skip<I> iterator is completed, the underlying iterator has also completed, but potentially after having generated some skipped elements that we existentially quantify over:

$$\begin{aligned} \text{completed}(it) & \triangleq & \exists w \ i, (\uparrow it). \mathbf{n} = 0 \land |w| \le (\star it). \mathbf{n} \\ & \qquad \land (\star it). \mathbf{i}. \mathbf{ter} \stackrel{w}{\leadsto} \star i \land completed(i) \land \uparrow i = (\uparrow it). \mathbf{i}. \mathbf{ter} \end{aligned}$$

Using Skip<I> and Take<I> we are able to prove an algebraic property of iterators: if we take n elements and then skip n elements from that iterator, we must necessarily get the empty iterator.

assert!(iter.take(n).skip(n).next().is\_none())

This property is easy to prove from the composition of both production relations.

### 5 Closures in Rust

Unlike traditional functional languages, Rust has no function type for closures. Two closures, even with identical bodies, are not of the same type: closures are each given a unique, anonymous type representing the captured environment. This design is motivated by the need to fully resolve closures during compilation: the compiler is always able to identify exactly which piece of code is used at every call site. To abstract over closures and write higher-order functions, Rust provides three traits that the closure type may implement: FnOnce, FnMut, and Fn. They describe the different ways a closure's environment can be passed during a call: by ownership, by mutable reference or by immutable reference. The compiler automatically provides the relevant instances when a user writes a closure.

Traditionally, verifying higher-order code with mutable state has needed seperation logic or dynamic frames, but because of Rust's mutable value semantics we can avoid these tools. Instead, we provide a specification for higher-order functions in first-order logic, which generates simple verification conditions (see code of Section 7). Specifically, we extend FnOnce, FnMut, and Fn with logical predicates that capture the pre- and post- conditions of closures. We begin by considering the simplest case, FnOnce:

```
pub trait FnOnce<Args> {
  #[predicate] fn precondition(self, a: Args) -> bool;
  #[predicate] fn postcondition_once(self, a: Args, res: Self::Output)
                   -> bool;
  #[requires(self.precondition(args))]
  #[ensures(self.postcondition_once(args, result))]
  fn call_once(self, args: Args) -> Self::Output;
}
```
The predicates precondition and postcondition\_once refer to the specification added to the call\_once method used to call the closure.

A call to a FnOnce closure consumes it. On the other hand, FnMut allows a mutable closure to be called multiple times. Here is our extended FnMut trait:

```
pub trait FnMut<Args> : FnOnce<Args> {
  #[predicate] fn unnest(self, _: Self) -> bool;
  #[ensures(self.unnest(self))]
  #[law] fn unnest_refl(self);
  #[requires(self.unnest(b) && b.unnest(c))]
  #[ensures(self.unnest(c))]
  #[law] fn unnest_trans(self, b: Self, c: Self);
  #[predicate] fn postcondition_mut(&mut self, _: Args, _: Self::Output)
                   -> bool;
  #[requires((*self).precondition(arg))]
  #[ensures(self.postcondition_mut(arg, result))]
  fn call_mut(&mut self, arg: Args) -> Self::Output;
[...] }
```
Because every FnMut closure is also an FnOnce closure, we can reuse the precondition predicate to specify call\_mut. However, we need a new predicate for the richer postconditions that become possible: since the closure is called using a mutable borrow, the postcondition specify changes made to captured variables.

Rust compiles closures via closure conversion, the state of each closure becomes a struct holding references to all captured variables. However, this struct can only be modified in a restricted fashion: we can only mutate the values pointed by the captures, and not the captures themselves. In particular, this means the prophecies of captures remain constant. We capture this property in an unnesting predicate F::unnest(a, b). It expresses that the prophecies in the state of type F have not changed from a to b. This property is both reflexive and transitive which we capture via laws. The unnesting predicate is essential to link the states of a closure throughout repeated calls. Without it we would lose track of the contained prophecies.

In addition to these predicates, our FnMut trait contains laws we elided: unnest is implied by postcondition\_mut, and postcondition\_mut is linked to the postcondition predicate of the FnOnce trait.

Finally, Fn imposes that the closure is immutable. Each call upholds the postcondition and leaves the state intact. Again, in the following, we elided laws relating postcondition, postcondition\_mut and postcondition\_once:

```
pub trait Fn<Args> : FnMut<Args> {
```

```
#[predicate] fn postcondition(&self, _: Args, _: Self::Output) -> bool;
  #[requires((*self).precondition(arg))]
  #[ensures(self.postcondition(arg, result))]
  fn call(&self, arg: Args) -> Self::Output;
[...] }
```
### 6 A Higher-order Iterator Adapter: Map

The challenge with the specification of Map is proving the preconditions of the closure being called. Map treats the closure opaquely, it cannot tell what the concrete pre- and post- conditions are, the justification for the precondition must come from elsewhere. To help work through this, we use a thought experiment where we see Map implemented as a loop with a yield instruction to generate elements, in the style of e.g., Python generators:

```
fn map<I : Iterator, B, F: FnMut(I::Item) -> B>(iter: I, func: F) {
  for a in iter { yield (f)(a) }
}
```
To verify it, we need f.precondition(a) to be true at each iteration, so we need an invariant which implies it. This exposes the key property that must be true of our closure: the postcondition at iteration n must be able to establish the precondition for iteration n + 1. In the vocabulary of iterators:

```
it s·e 1·e2
         i
          0 ∧ pre(*f, e1) ∧ post(f, e1, r) ⇒ pre(ˆf, e2)
```
This expresses that if we eventually produce an element e<sup>1</sup> which satisfies the precondition of the initial closure \*f, then combined with the postcondition of f, we must be able to establish the precondition for the final closure ˆf with the following element e2. Quantifying over a prefix s in the iteration from a known initial state i ensures this property holds for all possible subsequent iterations.

To encode this property in Map, we use a type invariant, which allows specifying a property that values of a type must uphold. Values of type Map are records with two fields: field func contains the closure state, and field iter contains the underlying iterator. The invariant states that (1) the precondition for the next call will be verified; (2) the preservation property above holds for the current state it; (3) these two invariants are reestablished if the underlying iterator returns None (this is usually trivial since the underlying iterator often is fused: it cannot generate new elements once it returns None); and (4) the type invariant of the underlying iterator holds.

These invariants are initially required as a precondition of the map method used to create the Map iterator. In order to be tackled by automated solvers, this verification condition need to be unfolded: it is therefore crucial that closures and their pre- and post- conditions are statically resolved thanks to the unique anonymous closure types in Rust.

The specification predicates for Map can now be stated:

$$\begin{aligned} \textit{it} \stackrel{v}{\rightharpoonup} \textit{it'} \stackrel{\scriptstyle \hspace{0.5cm}}{=} \exists v'fs, \ |v'| = |fs| = |v| \land \textit{it.\textbf{i}ter} \stackrel{v'}{\rightharpoonup} \textit{it'.\textbf{i}ter} \\ \land \ (\textit{it.\textbf{func}} = \mathsf{\*}fs[0] \land \ \textit{\prime}fs[0] = \mathsf{\*}fs[1] \land \ \wedge \ \textit{\prime}fs[n] = \textit{it'.\textbf{func}}) \\ \land \forall i \in [0, |v|-1], \ pre(\mathsf{\*}fs[i], v'[i]) \land post(fs[i], v'[i], v[i]) \\ \land \textit{unnest(it.\textbf{func}, it'.\textbf{func})} \end{aligned}$$

completed(it) , completed(it.iter) ∧ (\*it).func = (ˆit).func

In , we quantify existentially over two pieces of information: the sequence of values v <sup>0</sup> produced by the underlying iterator and the sequence of mutable references of states fs that the closure traverses. We require that fs forms a chain, the final state of each element being the same as the current value of the following one. Finally, we require the closure pre- and post- conditions for every iteration, and that the first and last state are related by the unnesting relation. The definition of completed(−), on the other hand, straightforwardly states that the underlying iterator is completed.

Interestingly, the user of this specification can use the precondition of the closure to encode closure invariants that she wishes to maintain along the iteration (as with loop invariants). This specification for Map allows us to specify many use cases, so long as the supplied closure is "history-free": its specification does not depend on the sequence of previously generated values, like in x.map(|a : u32| a + 5). While this is certainly the most common usage of map, we sometimes need a more powerful specification.

Extending Map With Ghost Information. If we attempt to use the previous specification of Map to verify the counter example of Section 1, we will rapidly encounter an issue: to establish that cnt properly counts the number of iterations would require a (manual) induction on the iterated sequence. While the prior specification allows the closure to specify the impact of an immediate call, it has no way of reasoning on the position in the iteration. In our prior thought experiment using a generator, we have no way of writing an invariant which depends on produced, as we allowed for usual for loops.

To make the verification of this kind of code simpler, we extend the signature of Map to provide to the closure the sequence of elements generated by the underlying iterator since the creation of the mapping iterator object. This information does not change the behavior of the program: we make it ghost, so it can only be used in specifications.

The extended version, MapExt, is thus given an additional ghost field, produced, containing this sequence. The relation ( ) is extended to account for this ghost information, by adding a conjunct stating that it<sup>0</sup> .produced = it.produced·v <sup>0</sup> and passing the additional ghost parameter it.produced·v 0 [0..i−1] to the pre- and post- conditions. The completed() relation is extended by adding the conjunct (ˆit).produced = ε (the produced field is reset when the iterator returns None). The type invariants are adapted accordingly.

This extra information avoids the need for an explicit induction after the fact to establish that we have properly counted the number of iterations: the postcondition of the last call to next is enough. This mechanism is useful in a wide variety of situations, beyond reasoning on the length of the sequence.

### 7 Evaluation

In this section we measure the performance of both the proofs of iterators and their clients, using the Creusot [4] tool for verification of Rust programs. It allows for verification of Rust programs, and requires some annotations to verify the functional correctness of Rust programs. Verification is performed by translating annotated Rust code into a pure, first-order functional program. Then, Creusot uses Why3 [15] to generate verification conditions, which are discharged using automated solvers such as CVC5, Z3 or Alt-Ergo.

The results in Figure 3, were gathered using a Macbook Pro with an M1 Pro CPU and 32 GB of RAM, running macOS 12.2. Why3 was limited to using four provers simultaneously among Z3 4.11.2, CVC5 1.0.2, and Alt-Ergo 2.4.1.

Why3 supports proof transformations: manual tactics which can be used in combination with automated solvers. Because we wish to obtain ergonomic specifications which work well with automation, we minimize their use. Nevertheless, certain complex proofs required minor manual work, which we clearly indicate.


Fig. 3. Selected evaluation results. "LOC" counts the lines of program code, while "Spec" counts specification code and assertions. "Time" measures in seconds the time taken to solve the proofs. "Fully auto." determines whether manual tactics were used.

The left table in Figure 3 contains a selection of the iterators and adapters we have verified. The Range, IterMut, Skip and Take iterators are implementations of the iterators described in Sections 4.1 to 4.3. The Fuse adapter is responsible for transforming any iterator into a fused one, which will always return None after the first, never resuming iteration. Two versions of Map are provided, the first is the standard library Map, which is restricted to closures whose preconditions are 'history-free', the version in MapExt is provided with ghost information about previous calls as explained in Section 6.

Some manual proof steps were required to prove several iterators. For Skip<I> and Fuse, the manual tactics consist only of telling Why3 to access lemmas about sequences. For Map and MapExt, tactics were used to instantiate quantifiers within the production relation. We think that the use of ghost variables and of the SMT theory of sequences could lift the use of manual tactics.

We also verified several clients of iterators, sometimes featuring combinations of several iterators. The example decuple\_range maps a Range, multiplying elements by 10, collecting the results into a vector and verifying functional correctness; counter is an annotated version of the example in the introduction, verifying that we can use mutable state to count the elements of an iterator; concat\_vec uses extend to append an iterator to the end of a vector; all\_zero uses IterMut to zero every cell of a vector; take\_skip checks that if we truncate an iterator to the first n elements and then skip them, the resulting iterator must be empty. We have larger scale examples where iterators are used in the context of a larger verified development: hillel is a port of a prior Creusot solution to Hillel Wayne's verification challenges [16]; knights\_tour is the same for the Knight's Tour problem. In both of these cases, updating the code to use for-loops and iterators actually reduced the number of lines of specification.

Because our lines of specification include the assertions which test functional properties, we believe the resulting overhead is reasonable, especially in our client examples. Additionally, our specifications for iterators seem to have low impact on verification times. We compared hillel and knights\_tour with alternative versions that only differ by using traditional while loops instead of iterators, verification times are 0.91 and 1.14 respectively. This provides evidence that integrating our iterators does not cause prohibitive increases in verification time.

### 8 Related and Future Work

RustHorn [7] and RustHornBelt [8] show how the non-aliasing guarantees of Rust can be used for reducing the verification of Rust programs into the proof of first-order logic formulas. These works serve as theoretical foundations for Creusot [4], which we use to evaluate our specification scheme for iterators.

Prusti [1] is a semi-automatic verifier for Rust built on the Viper [10] separation logic verification platform. Prusti models mutable borrowing and ownership using separation logic permissions, unlike our choice of using a prophetic mutable value semantics. This leads to differences in the specification languages: whereas Creusot uses the ^ operator to reason about borrows, Prusti uses a notion called pledges. Pledges are assertions which must be true at the end of a specific lifetime. At the time of writing, pledges are not fully first-class in Prusti's specification logic: they are used through a kind of postcondition. In particular a ghost predicate like produces cannot contain a pledge. The ^ operator can be used anywhere in specifications, which allows us to give a natural specification to mutating iterators like IterMut (Section 4.2).

The verification of higher-order programs has been studied by Régis-Gianas and Pottier [14], who verify them using higher-order logic. Prusti supports closures by modeling them in Viper's separation logic [17]. Like our approach, Prusti transforms specifications of higher-order programs into first-order verification conditions, but in separation logic. They introduce several constructs to specify closures: history invariants, specification entailment, and call descriptions. We instead enable users to refer to pre- and post- conditions of closures via a trait. While we not have the constructs Prusti provides primitively for closures, we believe these constructs can be encoded using our primitives, at the cost of lower ergonomics. Our approach is more expressive: unlike Prusti's call descriptions, we can distinguish the order of calls (see Section 6). Also, Prusti's approach for borrows makes it difficult to handle iterators such as IterMut.

Like us, Aeneas [6] verifies Rust programs by translation to a functional language, and targets traditional proof assistants such as Coq, or F ∗ . They use a technique called backward functions to interpret mutable borrows. To our knowledge, Aeneas supports neither closures nor iterators.

The formalization of iterators is a well-studied subject with implementations in a variety of imperative and functional languages: WhyML [5], Eiffel [11], Java [9], and OCaml [12]. Of particular relevance is the approach developed by Filliâtre and Pereira [5], which specifies iterators in WhyML using a ghost field visited : seq 'a and two predicates permitted : cursor 'a -> bool and completed : cursor 'a -> bool where cursor 'a is an iterator for values of type 'a. This work leverages Why3's regions system to distinguish individual cursors over time. In contrast, in our context, we lose object identity: there is no way to identify that two iterator values are two successive states of the same iterator. We thus generalize this approach to our setting by explicitly providing pre- and post- states in produces. Our work is also more expressive: we specify and verify higher-order iterators using potentially mutable closures, which are ruled out by Why3's region system. The framework of iteration described by Polikarpova, Tschannen, and Furia [11] is limited to finite, deterministic iteration: the user must provide up front the sequence of abstract values the iterator will produce. Pottier [12] presents an implementation of iterators for a hash map written in OCaml. They do this by working in the separation logic CFML [2], utilizing Coq's powerful but manual reasoning mechanisms for theorem proving. While Pottier does not provide a general specification of iterators (cascades) with mutable state, CFML should permit it, though usage may require a challenging proof.

Future Work. While we have specified and proved key iterators, many more remain. The filter adapter is interesting as each call to next may make an unbounded number of steps with the underlying iterator using the provided mutable closure. Rust provides a hierarchy of traits that further refine iterators like DoubleEndedIterator, and ExactSizeIterator. The recent integration of generic associated types enables new, more flexible forms of iteration like lending iterators. We believe these would naturally integrate into our framework, but remain to be done. Finally, while we believe we have developed a correct, and simple approach to specify closures, the ergonomics leave much room for improvement. Improving this will help make our specifications more concise and user-friendly. In particular, we would like to explore automatic inference of pre- and postconditions of simple closures.

### Data availability

The implementation of Creusot and the examples that we used to evaluate our methodology in Section 7 form an artifact available [3] on Zenodo with DOI 10.5281/zenodo.7305463.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Extending a High-Performance Prover to Higher-Order Logic

Petar Vukmirović<sup>1</sup> , Jasmin Blanchette1,2() , and Stephan Schulz<sup>3</sup>

> <sup>1</sup> Vrije Universiteit Amsterdam, Amsterdam, The Netherlands petar.vukmirovic2@gmail.com,j.c.blanchette@vu.nl

> <sup>2</sup> Université de Lorraine, CNRS, Inria, LORIA, Nancy, France <sup>3</sup> DHBW Stuttgart, Stuttgart, Germany stephan.schulz@dhbw-stuttgart.de

Abstract. Most users of proof assistants want more proof automation. Some proof assistants discharge goals by translating them to first-order logic and invoking an efficient prover on them, but much is lost in translation. Instead, we propose to extend first-order provers with native support for higher-order features. Building on our extension of E to λ-free higher-order logic, we extend E to full higher-order logic. The result is the strongest prover on benchmarks exported from a proof assistant.

### 1 Introduction

In the last few decades, proof assistants have become indispensable tools for developing trustworthy formal proofs. They are used both in academia to verify mathematical theories [17] and in industry to verify the correctness of hardware [21] and software [16,22,24]. However, due to the lack of strong built-in proof automation, proving seemingly simple goals can be a tedious manual task. To mitigate this, many proof assistants include a subsystem such as CoqHammer, HOL(y)Hammer, or Sledgehammer [9] that translates higher-order goals to first-order logic and passes them to efficient first-order automatic provers. If a first-order prover succeeds, the proof is reconstructed and the goal is closed.

Unfortunately, the translation of higher-order constructs is clumsy and leads to poor performance on goals that require higher-order reasoning. Using native higher-order provers such as Satallax [10] as backends is not always a good solution because they are much less efficient than their first-order counterparts [37]. To bridge this gap, in 2016 we proposed to develop a new generation of higherorder provers that extend the arguably most successful first-order calculus, superposition, to higher-order logic, starting from a position of strength.

Our research has focused on three milestones: supporting λ-free higher-order logic, adding λ-terms, and adding first-class Boolean terms. In 2019, we extended the state-of-the-art first-order prover E [32] with a λ-free superposition calculus [42], obtaining a version of E called Ehoh, as a stepping stone towards full higher-order logic. Together with Bentkamp, Tourret, and Waldmann, we have since developed calculi, called λ-superposition, corresponding to the other two milestones [5,4] and implemented them in the experimental superposition prover

 c The Author(s) 2023 S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 111–129, 2023. https://doi.org/10.1007/978-3-031-30820-8\_10

Zipperposition [14]. This OCaml prover is not nearly as efficient as E. Nevertheless, it has won the higher-order division of the CASC prover competition [39] in 2020, 2021, and 2022, ending nearly a decade of Satallax domination.

We now fulfill a four-year-old promise: We present the extension of Ehoh to full higher-order logic (Sect. 2) based on incomplete variants of λ-superposition. We call this prover λE. In λE's implementation, we used the extensive experience with Zipperposition to choose a set of effective rules that could easily be retrofitted into an originally first-order prover. Another guiding principle was gracefulness: Our changes should not impact the strong first-order performance of E and Ehoh.

One of the main challenges we faced was retrofitting λ-terms in Ehoh's term representation (Sect. 3). Furthermore, Ehoh's inference engine assumes that inferences compute a most general unifier. We implemented a higher-order unification procedure [41] that can return multiple unifiers (Sect. 4) and integrated it in the inference engine. Finally, we extended and adapted the superposition rule, resulting in an incomplete, pragmatic variant of λ-superposition (Sect. 5).

We evaluated λE on a selection of proof assistants benchmarks as well as all higher-order theorems in the TPTP library [38] (Sect. 6). λE outperformed all other higher-order provers on the proof assistant benchmarks; on the TPTP benchmarks, it ended up second only to the cooperative version of Zipperposition, which employs Ehoh as a backend. An arguably fairer comparison without the backend puts λE in first place for both benchmark suites. We also compared the performance of λE with E on first-order problems and found that no overhead has been introduced by the extension to higher-order logic.

λE is part of the E prover's development repository and will be part of E 3.0. It can be enabled by passing the option --enable-ho to the configure script. E and λE's source code is freely available online.<sup>1</sup>

### 2 Logic

Our target logic is monomorphic classical higher-order logic with Hilbert choice. The following text is partly based on Vukmirović et al. [40, Sect. 2].

Terms s, t, u, v are inductively defined as free variables F, X, . . ., bound variables x, y, z, . . . , constants f, g, a, b, . . . , applications s t, and λ-abstractions λx. s. Bound variables may be loose (e.g., y in λx. y a) [27].

We let s t<sup>n</sup> stand for s t<sup>1</sup> . . . t<sup>n</sup> and λxn. s for λx1. . . . λxn. s. Every β-normal term can be written as λxm. s tn, where s is not an application; we call s the head of the term. If s is a free variable, we call the term flex ; otherwise, the term is rigid. A term of type o, where o is the distinguished Boolean type, is called a formula. A term whose type is of the form τ<sup>1</sup> → · · · → τ<sup>n</sup> → o is called a predicate. Logical symbols are part of the signature and may thus occur within terms. We write them in bold: ⊥,>,¬,∧,∨,→,↔,∀,∃,≈.

On top of the terms, we define some clausal structure. This structure is needed by λ-superposition. A literal l is an equation s ≈ t or a disequation s 6≈ t. A clause is a finite multiset of literals, interpreted and written disjunctively: l<sup>1</sup> ∨ · · · ∨ ln.

<sup>1</sup> https://github.com/eprover/eprover.git

#### 3 Terms

E is designed around perfect term sharing [25], a principle that we kept in Ehoh and λE: Any two structurally identical terms are guaranteed to be the same object in memory. This is achieved through term cells, which represent individual terms. Each cell has (among other fields) (1) f\_code, an integer corresponding to the symbol at the head of the term (negative if the head is a free variable, positive otherwise); (2) num\_args, corresponding to the number of arguments applied to the head; and (3) args, an array of size num\_args of pointers to argument terms. We use the notation f(s1, . . . , sn) to denote a cell whose f\_code corresponds to f, num\_args equals n, and args points to the cells for s1, . . . sn.

Like Leo-III [33, Sect. 4.8], Ehoh represents λ-free higher-order terms using a flattened, spine notation [12]. Thus, the terms f, f a, and f a b are represented by the cells f, f(a), and f(a, b). To ensure that free variables are perfectly shared, Ehoh treats applied free variables differently: Arguments are not applied directly to a free variable, but using a distinguished symbol @ of variable arity. For example, the term X a b is represented by the cell @(X, a, b). This ensures that two different occurrences of the free variable X correspond to the same object, which makes substitutions more efficient [42].

Representation of λ-Terms. To support full higher-order logic, Ehoh's λ-free cell data structure must be extended to support the λ binder. We use the locally nameless representation [13]: De Bruijn indices represent (possibly loose) bound variables, whereas we keep the current representation for free variables.

Extending the term representation of Ehoh with a new term kind involves intricate manipulation of the cell data structure. De Bruijn indices must be represented like other cells with either a negative or a positive f\_code, but the code must clearly identify that the cell is a De Bruijn index.

Apart from during β-reduction, De Bruijn indices mostly behave like constants. Therefore, we choose to represent De Bruijn indices using positive f\_codes: The De Bruijn index i will have f\_code i. To ensure that De Bruijn indices are not mistaken for function symbols, we use the cell's properties bitfield, which holds precomputed properties. We introduce the property IsDBVar to denote that the cell represents a De Bruijn index. De Bruijn indices are systematically created through a dedicated function that sets the IsDBVar property. When given the same De Bruijn index and type, this function always returns the same object. Finally, we guard all the functions and macros that manipulate function codes to check if the property IsDBVar is set. To ensure perfect sharing of De Bruijn indices, arguments to De Bruijn indices are applied like for free variables, using @.

Extending cells to support λ-abstraction is easier. Each λ-abstraction has the distinguished function code LAM as the head symbol and two arguments: (1) a De Bruijn index 0 of the type of the abstracted variable; (2) the body of the λ-abstraction. Consider the term λx. λy. f x x, where both x and y have the type ι. This term is represented as λ λ f 1 1 in locally nameless representation, where bold numbers represent De Bruijn indices. In λE, the same term is represented by the cell LAM(0, LAM(0, f(1, 1))), where all De Bruijn variables have type ι.

The first argument of LAM is redundant, since it can be deduced from the type of the λ-abstraction. However, basic λ-term manipulation operations often require access to this term. We store it explicitly to avoid creating it repeatedly.

Efficient β-Reduction. Terms are stored in βη-reduced form. As these two reductions are performed very often, they ought to be efficient. Ehoh performs β-reduction by reducing the leftmost outermost β-redex first. To represent βredexes, E uses the @ symbol. Thus, the term (λx. λy.(x y))f a is represented by @(LAM(0, LAM(0, @(1, 0))), f, a). Another option would have been to add arguments applied to λ-terms directly to the λ representation (as in LAM(0, LAM(0, @(1, 0)), f, a)), but this would break the invariant that LAM has two arguments. Furthermore, replacing free variables with λ-abstractions (e.g., replacing X with λx. x in @(X, a)) would require additional normalization.

A term can be β-reduced as follows: When a cell @(LAM(0, s), t) is encountered, the field binding (normally used to record the substitution for a free variable) of the cell 0 is set to t. Then s is traversed to instantiate every loose occurrence of 0 in s with binding, whose loose De Bruijn indices are shifted by the number of λ binders above the occurrence of 0 in s [20]. Next, this procedure is applied to the resulting term and its subterms, in leftmost outermost fashion.

λE's β-normalization works in this way, but it features a few optimizations. First, given a term of the form (λxn. s)tn, λE, like Leo-III [34], replaces the bound variables x<sup>i</sup> with t<sup>i</sup> in parallel. Avoiding the construction of intermediate terms reduces the number of recursive function calls and calls to the cell allocator.

Second, in line with the gracefulness principle, we want λE to incur little (or no) overhead on first-order problems and to excel on higher-order problems with a large first-order component. If β-reduction is implemented naively, finding a βredex involves traversing the entire term. On purely first-order terms, β-reduction is then a waste of time. To avoid this, we use Ehoh's perfectly shared terms and their properties field. We introduce the property HasBetaReducibleSubterm, which is set if a cell is β-reducible. Whenever a new cell that contains a βreducible term as a direct subterm is shared, the property is set. Setting of the property is inductively continued when further superterms are shared. For example, in the term t = f a (g((λx. x) a)), the cells for (λx. x) a, g ((λx. x) a), and t itself have the property HasBetaReducibleSubterm set. When it needs to find β-reducible subterms, λE will visit only the cells with this property set. This further means that on first-order subterms, a single bit masking operation is enough to determine that no subterm should be visited.

Along similar lines, we introduce a property HasDBSubterm that caches whether the cell contains a De Bruijn subterm. This makes instantiating De Bruijn indices during β-normalization faster, since only the subterms that contain De Bruijn indices must be visited. Similarly, some other operations such as shifting De Bruijn indices or determining whether a term is closed (i.e., it contains no loose bound variables) can be sped up or even avoided if the term is first-order.

Efficient η-Reduction. The term λx. s x is η-reduced to s whenever x does not occur unbound in s. Observing that a term cannot be η-reduced if it contains no λ-abstractions, we introduce a property HasLambda that notes the presence of λ's in a term. Only terms with λ's are visited during η-reduction.

λE performs parallel η-reduction: It recognizes terms of the form λxm. s x<sup>m</sup> such that none of the x<sup>i</sup> occurs unbound in s. If done naively, reducing terms of this kind requires up to m traversals of s to check if each x<sup>i</sup> occurs in s. In λE, exactly one traversal of s is required. More precisely, when η-reducing a cell LAM(0, s), λE considers all λ binders in s as well. In general, the cell will be of the form LAM(0, . . . , LAM(0, t). . .), where t is not a λ-abstraction, and l is the number of LAM symbols above t. Then λE breaks the body t down into a decomposition u (n − 1) . . . 1 0 where u is not of the form . . . n; such a decomposition is unique. If n = 0, the cell is not η-reducible. Otherwise, u is traversed to determine the minimal index j of a loose De Bruijn index, taking j = ∞ if no such index exists. λE can then remove the k = min{j, l, n} rightmost outermost λ binders in LAM(0, . . . , LAM(0, t). . .) and replace t by the variant of u (n − 1) . . . (k + 1) k obtained by shifting the loose De Bruijn indices down by k.

To illustrate this convoluted De Bruijn arithmetic, we consider the term λx. λy. λz. f x x y z. This term is represented by the cell LAM(0, LAM(0, LAM(0, f(2, 2, 1, 0)))). λE splits f(2, 2, 1, 0) into two parts: u = f 2 and the arguments 2, 1, 0. Since the minimal index in u is 2, we can omit the De Bruijn indices 1 and 0 and their λ binders, yielding the η-reduced cell LAM(0, f(0, 0)).

Parallel η-reduction both speeds up η-reduction and avoids creating intermediate terms. For finding the minimal loose De Bruijn index, optimizations such as the HasDBSubterm property are used.

Representation of Boolean Terms. E and Ehoh represent Boolean terms using cells whose f\_codes are reserved for logical symbols. Quantified formulas are represented by cells in which the first argument is the quantified variable and the second one is the body of the quantified formula. For example, the term ∀x. p x corresponds to the cell ∀(X, p(X)), where X is a free variable. This representation is convenient for parsing and clausification, which is what E and Ehoh use it for, but in full higher-order logic, it is problematic during proof search: Booleans can occur as subterms in clauses, as in q(X) ∨ p(∀(X,r(X))), and instantiating X in the first literal should not affect X in the second literal.

To avoid this issue, in λE we use λ binders to represent quantified formulas, as is customary in higher-order logic [1, §51]. Thus, ∀x. s is represented by ∀ (λx. s). Quantifiers are then unary symbols that do not directly bind the variables. Since λE represents bound variables using De Bruijn indices, this solves all α-conversion issues. However, this solution is incompatible with thousands of decades-old lines of clausification code that assumes E's representation of quantifiers. Therefore, λE converts quantified formulas only after clausification, for Boolean terms that occur in a higher-order context (e.g., as argument to a function symbol).

New Term Orders. The λ-superposition calculus is parameterized by a term order that is used to break symmetries in the search space. We implemented the versions of the Knuth–Bendix order (KBO) and lexicographic path order (LPO) for higher-order terms described by Bentkamp et al. [4]. These orders encode

λ-terms as first-order terms and then invoke the standard KBO or LPO. For efficiency, we implemented separate KBO and LPO functions that compute the order directly, intertwining the encoding and the order computation.

Ehoh cells contain a binding field that can be used to store the substitution for a free variable. Substitutions can then be applied by following the binding pointers, replacing each free variable with its instance. Thus, when Ehoh needs to perform a KBO or LPO comparison of an instantiated term, it needs only follow the binding pointers. In full higher-order logic, however, instantiating a variable can trigger a chain of βη-reductions, changing the shape of the term dramatically. To prevent this, λE computes the βη-reduced instances of the terms before comparing them using KBO or LPO.

### 4 Unification, Matching, and Term Indexing

Standard superposition crucially depends on the concept of a most general unifier (MGU). In higher-order logic, the concept is replaced by that of a complete set of unifiers (CSU), which may be infinite. Vukmirović et al. [41] designed an efficient procedure to enumerate a CSU for a term pair. It is implemented in Zipperposition, together with some extensions to term indexing. In λE, we further improve the performance of this procedure by implementing a terminating, incomplete variant. We also introduce a new indexing data structure.

The Unification Procedure. The unification procedure works by maintaining a list of unification pairs to be solved. After choosing a pair, it first normalizes it by β-reducing and instantiating the heads of both terms in the pair. Then, if either head is a variable, it computes an appropriate binding for this variable, thereby approximating the solution.

Unlike in first-order and λ-free higher-order unification, in the full higher-order case there may be many bindings that lead to a solution. To reduce this mostly blind guessing of bindings, the procedure features support for oracles [41]. These are procedures that solve the unification problem for a subclass of higher-order terms on which unification is decidable and, for λE, unary. Oracles help increase performance, avoid nontermination, and avoid redundant bindings.

Vukmirović et al. described their procedure as a transition system. In λE, the procedure is implemented nonrecursively, and the unifiers are enumerated using an iterator object that encapsulates the state of the unifier search. The iterator consists of five fields: (1) constraints, which holds the unification constraints; (2) bt\_state, a stack that contains information necessary to backtrack to a previous state; (3) branch\_iter, which stores how far we are in exploring different possibilities from the current search node; (4) steps, which remembers how many different unification bindings (such as imitation, projection, and identification) are applied; and (5) subst, a stack storing the variables bound so far.

The iterator is initialized to hold the original problem in constraints, and all other fields are initially empty. The unifiers are retrieved one by one by calling the function ForwardIter. It returns True if the iterator made progress, in which case the unifier can be read via the iterator's subst field. Otherwise, no more unifiers can be found, and the iterator is no longer valid. The function's pseudocode is given below, including two auxiliary functions:

```
function NormalizeHead(t) is
  if t.head = @ ∧ t.args[0].is_lambda() then
    reduce the top-level β-redex in t
    return NormalizeHead(t)
  else if t.head.is_var () ∧ t.head.binding 6= Nil then
    t.head ← t.head.binding
    return NormalizeHead(t)
  else
    return t
function BacktrackIter(iter ) is
  if iter .bt_state.empty() then
    clear all fields in iter
    return False
  else
    pop (constraints, branch_iter , steps, subst) from iter .bt_state
    set the corresponding fields of iter
    return True
function ForwardIter(iter ) is
  forward ← ¬ iter .constraints.empty() ∨ BacktrackIter(iter )
  while forward ∧ ¬ iter .constraints.empty() do
    (lhs, rhs) ← pop pair from iter .constraints
    lhs ← NormalizeHead(lhs)
    rhs ← NormalizeHead(rhs)
    normalize and discard the λ prefixes of lhs and rhs
  if ¬lhs.head.is_var () ∧ rhs.head.is_var () then
    swap lhs and rhs
  if lhs.head.is_var () then
    oracle_res ← Fixpoint(lhs, rhs, iter .subst)
  if oracle_res = NotInFragment then
    oracle_res ← Pattern(lhs, rhs, iter .subst)
    if oracle_res = NotUnifiable then
      forward ← BacktrackIter(iter)
    else if oracle_res = NotInFragment then
      n_steps, n_branch_iter , n_binding ←
         NextBinding(lhs, rhs, iter .steps, iter .branch_iter )
      if n_branch_iter 6= BindEnd then
         push pair (lhs,rhs) back to iter .constraints
         push quadruple (iter .constraints, n_branch_iter ,
           iter .steps, iter .subst) onto iter .bt_state
         extend iter .subst with n_binding
```

```
iter .steps ← n_steps
      iter .branch_iter ← BindBegin
    else if lhs.head = rhs.head then
      create constraint pairs of arguments of lhs and rhs
         and push them to iter .constraints
      iter .branch_iter ← BindBegin
else if lhs.head = rhs.head then
  create constraint pairs of arguments of lhs and rhs
     and push them to iter .constraints
else
  forward ← BacktrackIter(iter )
return forward
```
ForwardIter begins by backtracking if the previous attempt was successful (i.e., all constraints were solved). If it finds a state from which it can continue, it takes term pairs from constraints until there are no more constraints or it is determined that no unifier exists. The terms are normalized by instantiating the head variable with its binding and reducing the potential top-level β-redex that might appear. This instantiation and reduction process is repeated until there are no more top-level β-redexes and the head is not a variable bound to some term. Then the term with shorter λ prefix is expanded (only on the top level) so that both λ prefixes have the same length. Finally, the λ prefix is ignored, and we focus only on the body. In this way, we avoid fully substituting and normalizing terms and perform just enough operations to determine the next step of the procedure.

If either term of the constraint is flex, we first invoke oracles to solve the constraint. λE implements the most efficient oracles implemented in Zipperposition: fixpoint and pattern [41, Sect. 6]. An oracle can return three results: (1) there is an MGU for the pair (Unifiable), which is recorded in subst, and the next pair in constraints is tried; (2) no MGU exists for the pair (NotUnifiable), which causes the iterator to backtrack; (3) if the pairs do not belong to the subclass that oracle can solve (NotInFragment), we generate possible variable bindings—that is, we guess the approximate form of the solution.

λE has a dedicated module that generates bindings (NextBinding). This module is given the current constraint and the values of branch\_iter and steps, and it either returns the next binding and the new values of branch\_iter and steps or reports that all different variable bindings are exhausted. The bindings that λE's unification procedure creates are imitation, Huet-style projection, identification, and elimination (one argument at a time) [41, Sect. 3]. A limit on the total number of applied binding rules can be set, as well as a limit on the number of individual rule applications. The binding module checks whether limits are reached using the iterator's steps field.

Computing bindings is the only point in the procedure where the search tree branches and different possibilities are explored. Thus, when λE follows the branch indicated by the binding module, it records the state to which it needs to return should the followed branch be backtracked. The state consists of the values of constraints, steps, and subst before the branch is followed and the value of branch\_iter that points past the followed branch. The values of branch\_iter are either BindBegin, which denotes that no binding was created, intermediate values that NextBinding uses to remember how far through bindings it is, and BindEnd, which indicates that all bindings are exhausted.

If all bindings are exhausted, the procedure checks whether the pair is flex–flex and both sides have the same head. If so, the pair is decomposed and constraints are derived from the pair's arguments; otherwise, the iterator backtracks. If the pair is rigid–rigid, for unification to succeed, the heads of both sides must be the same. Unification then continues with new constraints derived from the arguments. Otherwise, the iterator must be backtracked.

Matching. In E, the matching algorithm is mostly used inside simplification rules such as demodulation and subsumption [29]. As these rules must be efficiently performed, using a complex matching algorithm is not viable. Instead, we provide a matching algorithm for the pattern class of terms [27] to complement Ehoh's λ-free higher-order matching algorithm [42, Sect. 4]. A term is a pattern if each of its free variables either has no arguments (as in first-order logic) or is applied to distinct De Bruijn indices.

To help determine whether to use the pattern or λ-free algorithm, we introduce a cached property HasNonPatternVar, which is set for terms of the form X s<sup>n</sup> where n > 0 and either there exists some s<sup>i</sup> that is not a De Bruijn index or there exist indices i < j such that s<sup>i</sup> = s<sup>j</sup> is a De Bruijn index. This property is propagated to the superterms when they are perfectly shared. This allows later checks if a term belongs to the pattern class to be performed in constant time.

We modify the λ-free higher-order matching algorithm to treat λ prefixes as above in the unification procedure—by bringing the prefixes to the same length and ignoring them afterwards. This ensures that the algorithm will never try to match a free variable with a λ-abstraction, making sure that β-redexes never appear. We also modify the algorithm to ensure that free variables are never bound to terms that have loose bound variables. This algorithm cannot find many complex matching substitutions (matchers), but it can efficiently determine whether two terms are variable renamings of each other or whether a simple matcher can be used, as in the case of (X (λx. x) b, f (λx. x) b), where X 7→ f is usually the desired matcher. If this algorithm does not find a matcher and both terms are patterns, pattern matching is tried.

Indexing. E, like other modern theorem provers, efficiently retrieves unifiable or matchable pairs of terms using indexing data structures. To find terms unifiable with a query term or instances of a query term, it uses fingerprint indexing [30]. Vukmirović et al. extended this data structure to support full higher-order terms in Zipperposition [41, Sect. 6]. We use the same approach in λE, and we extend feature vector indices [31] in the same way.

E uses perfect discrimination trees [26] to find generalizations of the query term (i.e., terms of which the query term is an instance). This data structure is a trie that indexes terms by representing them in a serialized, flattened form. The left branch from the root in Figure 1 shows how the first-order terms f a X

Fig. 1. First-order, λ-free higher-order, and higher-order pattern terms in a perfect discrimination tree

and f a a are stored. In Ehoh, this data structure is extended to support partial application and applied variables [42].

In λE, we extend this structure to support λ-abstractions and the higher-order pattern matching algorithm. To this end, we change the way in which terms are serialized. First, we require that all terms are fully η-expanded (except for arguments of variables applied in patterns). Then, when the term is serialized, we use a single node for applied variable terms X sn, instead of a node for X followed by nodes for the arguments sn. We serialize the λ-abstraction λx. s using a dedicated node LAM<sup>τ</sup> , where τ is the type of x, followed by the serialization of s. Other than these changes, serialization remains as in Ehoh, following the gracefulness principle. Figure 1 shows how g (X a b) c and h (λx. λy. X y x) are serialized. Since the terms are stored in serialized form, it is hard to manipulate λ prefixes of stored terms during matching. Performing η-expansion when serializing terms ensures that matchable terms have λ prefixes of the same length.

We have dedicated separate nodes for applied variables because access to arguments of applied variables is necessary for the pattern matching algorithm. Even though arguments can be obtained by querying the arity n of the variable and taking the next n arguments in the serialization, this is both inefficient and inelegant. As for De Bruijn indices, we treat them the same as function symbols.

Following the notation from the extension of perfect discrimination trees to λ-free higher-order logic [42], we now describe how enumeration of generalizations is performed. To traverse the tree, λE begins at the root node and maintains two stacks: term\_stack and term\_proc, where term\_stack contains the subterms of the query term that have to be matched, and term\_proc contains processed terms that are used to backtrack to previous states. Initially, term\_stack contains the query term, the current matching substitution σ is empty, and the successor node is chosen among the child nodes as follows:

A. If the node is labeled with a symbol ξ (where ξ is either a De Bruijn index or a constant) and the top item t of term\_stack is of the form ξ tn, replace t by n new items t1, . . . , tn, and push t onto term\_proc.


Backtracking works in the opposite direction: If the current node is labeled with a De Bruijn index or function symbol node of arity n, pop n terms from term\_stack and move the top of term\_proc to term\_stack. If the node is labeled with LAM<sup>τ</sup> , pop the top of term\_stack and move the top of term\_proc to term\_stack. Finally, if the node is labeled with a possibly applied variable, move the top of the term\_proc to term\_stack and restore the value of σ.

As an example of how finding a generalization works, when looking for generalizations of g (f a b) c in the tree of Figure 1, the following states of stacks and substitutions emerge, from left to right:


#### 5 Preprocessing, Calculus, and Extensions

Ehoh's simple λ-free higher-order calculus performed well on Sledgehammer problems and formed a promising stepping stone to full higher-order logic [42]. When implementing support for full higher-order logic, we were guided by efficiency and gracefulness with respect to Ehoh's calculus rather than completeness. Whereas Zipperposition provides both complete and incomplete modes, λE only offers incomplete modes.

Preprocessing. Our experience with Zipperposition showed the importance of flexibility in preprocessing the higher-order problems [40]. Therefore, we implemented a flexible preprocessing module in λE.

To maintain compatibility with Ehoh, λE can optionally transform all λabstractions into named functions. This process is called λ-lifting [19]. λE also removes all occurrences of Boolean subterms (other than ⊥,>, and free variables) in higher-order contexts using a FOOL-like transformation [23]. For example, the formula f(p ∧ q) ≈ a becomes (p ∧ q → f(>) ≈ a) ∧ (¬ (p ∧ q) → f(⊥) ≈ a).

Many TPTP problems use the definition role to identify the definitions of symbols. λE can treat definition axioms as rewrite rules, and replace all occurrences of defined symbols during preprocessing. Furthermore, during SInE [18] axiom selection, it can always include the defined symbol in the trigger relation. Calculus. λE implements the same superposition calculus as Ehoh with three important changes. First, wherever Ehoh requires the MGU of terms, λE enumerates unifiers from a finite subset of the CSU, as explained in Sect. 4. Second, λE uses versions of the KBO and LPO orders designed for λ-terms.

The third difference is more subtle. One of the main features of Ehoh is prefix optimization [42, Sect. 1]: a method that, given a demodulator s ≈ t, makes it possible to replace both applied and unapplied occurrences of s by t by traversing only the first-order subterms of a rewritable term. In a λ-free setting, this optimization is useful, but in the presence of βη-normalization, the shapes of terms can change drastically, making it much harder to track prefixes of terms. This is why we disable the prefix optimization in λE. To compensate for losing this optimization, we introduce the argument congruence rule AC in λE and enable positive and negative functional extensionality (PE and NE) by default:

$$\frac{s \approx t \lor C}{s \, X \approx t \, X \lor C} \text{AC} \quad \frac{s \not\approx t \lor C}{s \, (\text{sk} \, \overline{X}) \not\approx t \, (\text{sk} \, \overline{X}) \lor C} \\ \text{NE} \quad \frac{s \, X \approx t \, X \lor C}{s \approx t \lor C} \text{PE}$$

AC and NE assume that s and t are of function type. In NE, X denotes all the free variables occurring in s and t, and sk is a fresh Skolem symbol of the appropriate type. PE has a side condition that X may not occur in s, t, or C.

Saturation. E's saturation procedure assumes that each attempt to perform an inference will either result in a single clause or fail due to one of the inference side conditions. Unification procedures that produce multiple substitutions break this invariant, and the saturation procedure needed to be adjusted.

For Zipperposition, Vukmirović et al. developed a variant of the saturation procedure that interleaves computing unifiers and scheduling inferences to be performed [40]. Since completeness was not a design goal for λE, we did not implement this version of the saturation procedure. Instead, in places where previously a single unifier was expected, λE consumes all elements of the iterator used for enumerating a unifier, converting them into clauses.

Reasoning about Formulas. Even though most of the Boolean structure is removed during preprocessing, formulas can reappear at the top level of clauses during saturation. For example, after instantiating X with λx. λy. x∧y, the clause X p q ∨ a ≈ b becomes (p ∧ q) ∨ a ≈ b. λE converts every clause of the form ϕ ∨ C, where ϕ has a logic symbol as its head, or it is a (dis)equation between two formulas different than >, to an explicitly quantified formula. Then, the clausification algorithm is invoked on the formula to restore the clausal structure. Zipperposition features more dynamic clausification modes, but for simplicity we decided not to implement them in λE.

The λ-superposition calculus for full higher-order logic [4] includes many rules that act on Boolean subterms, which are necessary for completeness. Other than Boolean simplification rules, which use simple tautologies such as p ∧ > ↔ p to simplify terms, we have implemented none of the Boolean rules of this calculus in λE. First, we have observed that complicated rules such as FluidBoolHoist and FluidLoobHoist are hardly ever useful in practice and usually only contribute to an uncontrolled increase in the proof state size. Second, simpler rules such as BoolHoist can usually be simulated by pragmatic rules that perform Boolean extensionality reasoning, described below.

To make up for excluding Boolean rules, we use an incomplete, but more easily controllable and intuitive rule, called primitive instantiation. This rule instantiates free predicate variables with approximations of formulas that are ground instances of this variable. We use the approximations described by Vukmirović and Nummelin [43, Sect. 3.3].

λE's handling of the Hilbert choice operator is inspired by Leo-III's [35]. λE recognizes clauses of the form ¬ P X ∨ P (f P), which essentially denote that f is a choice symbol. Then, when subterm f s is found during saturation, s is used to instantiate the choice axiom for f. Similarly, Leibniz equality [43] is eliminated by recognizing clauses of the form ¬ P a ∨ P b ∨ C. These clauses are then instantiated with P 7→ λx. x ≈ a and P 7→ λx. x 6≈ b, which results in a ≈ b ∨ C.

Finally, λE treats induction axioms specially. Like Zipperposition [40, Sect. 4], it abstracts literals from the goal clauses and instantiates induction axioms with these abstractions. Since Zipperposition supports dynamic calculus-level clausification, induction axioms are instantiated during saturation, when the axioms are processed. In λE, this instantiation is performed immediately after clausification. After λE has collected all the abstractions, it traverses the clauses and instantiates those that have applied variable of the same type as the abstraction.

Extensionality. λE takes a pragmatic approach to reasoning about functional and Boolean extensionality: It uses abstracting rules [5] which simulate basic superposition calculus rules but do not require unifiability of the partner terms in the inference. More precisely, assume a core inference needs to be performed between two β-reduced terms u and v, such that they can be represented as u = C[s1, . . . , sn] and v = C[t1, . . . , tn], where C is the most general "green" [5] common context of u and v, not all of s<sup>i</sup> and t<sup>j</sup> are free variables, and for at least one i, s<sup>i</sup> 6= t<sup>i</sup> , s<sup>i</sup> and t<sup>i</sup> are not possibly applied free variables, and they are of Boolean or function type. Then, the conclusion is formed by taking the conclusion D of the core inference rule (which would be created if s and t are unifiable) and adding literals s<sup>1</sup> 6≈ t<sup>1</sup> ∨ · · · ∨ s<sup>n</sup> 6≈ tn.

These rules are particularly useful because λE has no rules that dynamically process Booleans in FOOL-like fashion, such as BoolHoist. For example, given the clauses f (p∧q) ≈ a and g (f p) 6≈ b, the abstracting version of the superposition rule would result in g a 6≈ b ∨ (p ∧ q) 6≈ p. In this way, the Boolean structure bubbles up to the top level and is further processed by clausification. We noticed that this alleviates the need for the other Boolean rules in practice.

#### 6 Evaluation

We now try to answer two questions about λE: How does λE compare against other higher-order provers (including Ehoh)? Does λE introduce any overhead

compared with Ehoh? To answer these questions, we ran provers on problems from the TPTP library [38] and on benchmarks generated by Sledgehammer (SH) [28]. The experiments were carried out on StarExec Miami [36] nodes equipped with Intel Xeon E5-2620 v4 CPU clocked at 2.10 GHz. For the TPTP part, we used the CASC 2021<sup>2</sup> time limits: 120 s wall-clock and 960 s CPU. For SH benchmarks and to answer the other question, we used Sledgehammer's default time limit: 30 s wall-clock and CPU. The raw evaluation data is available online.<sup>3</sup>

Comparison with Other Provers. To answer the first question, we let λE compete with the top contenders in the higher-order division of CASC 2021: cvc5 0.0.7 [2], Ehoh 2.7 [42], Leo-III 1.6.6 [35], Vampire 4.6 [8], and Zipperposition 2.1 [40]. We also included Satallax 3.5 [10]. We used all 2899 higher-order theorems in TPTP 7.5.0 as well as 5000 SH higher-order benchmarks originating from the Seventeen benchmark suite [15]. On SH benchmarks, cvc5, Ehoh, λE, Vampire, and Zipperposition were run using custom schedules provided by their developers, optimized for single-core usage and low timeouts. Otherwise, we used the corresponding CASC configurations.

Although it internally does not support λ-abstractions, Ehoh 2.7 can parse full higher-order logic using λ-lifting. We included two versions of Zipperposition: coop uses Ehoh 2.7 as a backend to finish proof attempts, whereas uncoop does not. Both Ehoh and λE were run in the automatic scheduling mode. Compared with Ehoh, λE features a redesigned module for automatic scheduling, it can exploit multiple CPU cores, and its heuristics have been more extensively trained on higher-order problems.

The results are shown in Figure 2. λE dramatically improves E's higher-order reasoning capabilities compared with Ehoh. It solves 20% more TPTP benchmarks and 7% more SH benchmarks. The reason for the higher performance increase for TPTP is likely that TPTP benchmarks tend to require more higher-order reasoning than SH benchmarks, which often have a large first-order component and for which Ehoh was already very successful.

λE was envisioned as an efficient backend to proof assistants. As such, it excels on SH benchmarks, outperforming the competition. On TPTP, it outperforms all higher-order provers other than Zipperposition-coop. If Zipperposition's Ehoh backend is disabled, λE outperforms Zipperposition by a wide margin. This comparison is arguably fairer; after all, λE does not use an older version of Zipperposition as a backend. These results suggest that λE already implements most of the necessary features for a high-performance higher-order prover but could benefit from the kind of fine-tuning that Zipperposition underwent in the last four years.

Remarkably, the raw evaluation data reveals thats λE solves 181 SH problems and 24 TPTP problems that Zipperposition-coop does not. The lower number of uniquely solved TPTP problems is likely because Zipperposition was heavily optimized on the TPTP.

<sup>2</sup> http://www.tptp.org/CASC/28/

<sup>3</sup> https://doi.org/10.5281/zenodo.6389849


Fig. 2. Comparison of higher-order provers

Comparison with the First-Order E. Both Ehoh and λE can be compiled in a mode that disables most of the higher-order reasoning. This mode is designed for users that are interested only in E's first-order capabilities and care a lot about performance. To answer the second evaluation question, about assessing overhead of λE, we chose all the 1138 unique problems used at CASC from 2019 to 2021 in the first-order theorem division and ran Ehoh and λE both in this first-order (FO) mode and in higher-order (HO) mode.

We fixed a single configuration of options, because Ehoh's and λE's automatic scheduling methods could select different configurations and we would not be measuring the overhead but the quality of the chosen configurations. We chose the boa configuration [42, Sect. 7], which is the configuration most often used by E 2.2 in its automatic scheduling mode. The results are shown in Figure 3.

Counterintuitively, the higher-order versions of both provers outperform the first-order counterparts. However, the difference is so small that it can be attributed to the changes to memory layout that affect the order in which clauses are chosen. Similar effects are visible when comparing the first-order versions.

CASC Results. λE also took part in CASC 2022. In the TPTP higher-order division, λE finished second, after Zipperposition, as expected. In the Sledgehammer division, λE tied with Ehoh for first place, a disappointment. The likely explanation is that λE used a wrong configuration in this division, as we found out afterwards. We expect better performance at CASC 2023.

#### 7 Discussion and Related Work

On the trajectory to λE, we developed, together with colleagues, three superposition calculi: for λ-free higher-order logic [6], for a higher-order logic with λ-abstraction but no Booleans [5], and for full higher-order logic [5]. These milestones allowed us to carefully estimate how the increased reasoning capabilities of each calculus influence its performance.

Extending first-order provers with higher-order reasoning capabilities has been attempted by other researchers as well. Barbosa et al. extended the SMT

solvers CVC4 (now cvc5) and veriT to higher-order logic in an incomplete way [3]. Bhayat and Reger first extended Vampire to higher-order logic using combinatory unification [8], an incomplete approach, before they designed and implemented a complete higher-order superposition calculus based on SKBCI combinators [7]. The advantage is that combinators can be supported as a thin layer on top of λ-free terms. This calculus is also implemented in Zipperposition. However, in informal experiments, we found that λ-superposition performs substantially better, corroborating the CASC results, so we decided to make a more profound change to Ehoh and implement λ-superposition.

Possibly the only actively maintained higher-order provers built from the bottom up as higher-order provers are Leo-III [35] and Satallax's [10] successor Lash [11]. A further overview of other traditional higher-order provers and the calculi they are based on can be found in the paper about Ehoh [42, Sect. 9].

### 8 Conclusion

In 2019, the reviewers of our Ehoh paper [42] were skeptical that extending Ehoh with support for full higher-order logic would be feasible. One of them wrote:

A potential criticism could be that this step from E to Ehoh is just extending FOL by those aspects of HOL that are easily in reach with rather straightforward extensions (none of the extensions is indeed very complicated), and that the difficult challenges of fully supporting HOL have yet to be confronted.

We ended up addressing the theoretical "difficult challenges" in other work with colleagues. In this paper, we faced the practical challenges pertaining to the extension of Ehoh's data structures and algorithms to support full higherorder logic and demonstrated that such an extension is possible. Our evaluation shows that this extension makes λE the best higher-order prover on benchmarks coming from interactive theorem proving practice, which was our goal. λE lags slightly behind Zipperposition on TPTP problems. One reason might be that Zipperposition does not assume a clausal structure and can perform subtle formula-level inferences. It would be useful to implement the same features in λE. We have also only started tuning λE's heuristics on higher-order problems.

Acknowledgment. Ahmed Bhayat and Martin Suda provided Vampire configurations optimized for Sledgehammer. Andrew Reynolds did the same for cvc5. Jannis Limperg helped us debug the submission artifact. Simon Cruanes, Wan Fokkink, Mark Summerfield, and the anonymous reviewers suggested several textual improvements. We thank them all.

This research has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No. 713999, Matryoshka). Vukmirović and Blanchette have received funding from the Netherlands Organization for Scientific Research (NWO) under the Vidi program (project No. 016.Vidi.189.037, Lean Forward).

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

**Tools (Regular Papers)**

## The WhyRel Prototype for Modular Relational Verification of Pointer Programs

Ramana Nagasamudram1() , Anindya Banerjee<sup>2</sup> , and David A. Naumann<sup>1</sup>

> <sup>1</sup> Stevens Institute of Technology, Hoboken, USA {rnagasam,dnaumann}@stevens.edu 2 IMDEA Software Institute, Madrid, Spain anindya.banerjee@imdea.org

Abstract. Verifying relations between programs arises as a task in various verification contexts such as optimizing transformations, relating new versions of programs with older versions (regression verification), and noninterference. However, relational verification for programs acting on dynamically allocated mutable state is not well supported by existing tools, which provide a high level of automation at the cost of restricting the programs considered. Auto-active tools, on the other hand, require more user interaction but enable verification of a broader class of programs. This article presents WhyRel, a tool for the auto-active verification of relational properties of pointer programs based on relational region logic. WhyRel is evaluated through verification case studies, relying on SMT solvers orchestrated by the Why3 platform on which it builds. Case studies include establishing representation independence of ADTs, showing noninterference, and challenge problems from recent literature.

Keywords: local reasoning · relational verification · auto-active verification · data abstraction.

### 1 Introduction

Relational properties encompass conditional equivalence of programs (as in regression verification [28]), noninterference (in which a program is related to itself via a low-indistinguishability relation), and other requirements such as sensitivity [6]. The problem we address concerns tooling for the modular verification of relational properties of heap-manipulating programs, including programs that act on differing data representations involving dynamically allocated pointer structures.

Modular reasoning about pointer programs is enabled through local reasoning using frame conditions, procedural abstraction (i.e., reasoning under hypotheses about procedures a program invokes), and data abstraction, requiring state-based encapsulation. For establishing properties of ADTs such as representation independence, encapsulation plays a crucial role, permitting implementations to rely on invariants about private state hidden from clients. Relational verification also involves a kind of compositionality, the alignment of intermediate execution steps, which enables use of simpler relational invariants and specs (see e.g. [29,17,25]).

We aim for auto-active verification [19], accessible to developers, as promoted by tools such as Dafny and Why3. Users are expected to provide specifications, annotations such as loop invariants and assertions, and, for relational verification, alignment hints. The idea is to minimize or eliminate the need for users to manually invoke tactics for proof search.

Automated inference of specs, loop invariants, or program alignments facilitates automated verification, and is implemented in some tools. But in the current state of the art these techniques are restricted to specs and invariants of limited forms (e.g., only linear arithmetic) and seldom support dynamically allocated objects. So inference is beyond the scope of this paper.

What is in scope is use of strong encapsulation, to hide information in the sense that method specs used by clients do not expose internal representation details, and to enable verification of modular correctness of a client, in the sense that its behavior is independent from internal representations. Achieving strong encapsulation for pointer programs, without undue restriction on data and control structure, is technically challenging. Auto-active tools rely on extensive axiomatization for the generation of verification conditions (VCs); for high assurance the VCs should be justified with respect to a definitional operational semantics of programs and specs.

In this article, we describe WhyRel, a prototype for auto-active verification of relational properties of pointer programs. Source programs are written in an imperative language with support for shared mutable objects (but no subtyping), dynamic allocation, and encapsulation. The assertion language is first-order and, for expressing relational properties, includes constructs that relate values of variables and pointer structures between two programs. WhyRel is based on relational region logic [1], a relational extension of region logic [4,2]. Region logic provides a flexible approach to local reasoning through the use of dynamic frame conditions [15] which capture footprints of commands acting on the heap. Verification involves reasoning explicitly about regions of memory and changes to them as computation proceeds; flexibility comes from being able to express notions such as parthood and separation in the same first-order setting.

Encapsulation is specified using a kind of dynamic frame, called a dynamic boundary: a footprint that captures a module's internal locations. Enforcing encapsulation is then a matter of ensuring that clients don't directly modify or update locations in a module's boundary. There are detailed soundness proofs for the relational logic [1], of which our prototype is a faithful implementation.

WhyRel is built on top of the Why3 platform<sup>3</sup> for deductive program verification which provides infrastructure for verifying programs written in WhyML, a subset of ML [7] with support for ghost code and nondeterministic choice. The assertion language is a polymorphic first-order logic extended with support for algebraic data types and recursively and inductively defined predicates [11]. Why3 generates VCs for WhyML which can then be discharged using a wide array of theorem provers, from interactive proof assistants such as Coq and Isabelle, to first-order theorem provers and SMT solvers such as Vampire, Alt-Ergo and Z3.

<sup>3</sup> The Why3 distribution can be found at: https://why3.lri.fr/.

Primarily, WhyRel is used as a front end to Why3. Users provide programs, specs, annotations, and for relational verification, relational specs and alignment specified using a specialized syntax for product programs. WhyRel translates source programs into WhyML, performing significant encoding so as to faithfully capture the heap model and fine-grained framing formalized in relational region logic. VCs pertinent to this logic are introduced as intermediate assertions and lemmas for the user to establish. Verification is done using facilities provided by Why3 and the primary mode of interaction is through an IDE for viewing and discharging verification conditions.

Our approach is evaluated through a number of case studies performed in WhyRel, for which we rely entirely on SMT solvers to discharge proof obligations. The primary contribution is the development of a tool for relational verification of heap manipulating programs which has been applied to challenging case studies. Examples formalized demonstrate the effectiveness of relational region logic for alignment, for expressing heap relations, and for relational reasoning that exploits encapsulation.

Organization. Sec. 2 highlights aspects of specifying programs and relational properties in WhyRel using a stack ADT example. Sec. 3 discusses examples of program alignment. Sec. 4 gives an overview of the design of WhyRel and Sec. 5 provides highlights on experience using the tool. Sec. 6 discusses related work and Sec. 7 concludes.

#### 2 A tour of WhyRel

Programs and specifications. WhyRel provides a lightweight module system to organize definitions, programs, and specs. Developments are structured into interfaces and modules that implement interfaces. In addition, for relational verification, WhyRel introduces the notion of a bimodule, described later, to relate method implementations between two (unary) modules.

We'll walk through aspects of specification in WhyRel using the STACK interface shown in Fig. 1, which describes a stack of boxed integers with push and pop operations. The interface starts by declaring global variables, pool and capacity, and client-visible fields of the Cell and Stack classes. Variable pool has type rgn, where a region is a set of references, and is used to describe objects notionally owned by modules implementing the stack interface; capacity has type int and describes an upper bound on the size of a stack. The Cell class for boxed integers is declared with a single field, val, storing an int. The Stack class is declared with three fields: rep of type region keeps track of objects used to represent the stack, size of type int stores the number of elements in the stack, and the ghost field abs of type intlist (list of mathematical integers) keeps track of an abstraction of the stack, used in specs. Class definitions can be refined later by modules implementing the interface: e.g., a module using a linked-list implementation might extend the Stack class with a field head storing a reference to the list.

Heap encapsulation is supported at the granularity of modules through the use of dynamic module boundaries which describe locations internal to a module. A

```
interface STACK =
  public pool:rgn /* rgn: a set of references */ public capacity:int
  class Cell {val:int} class Stack {rep:rgn; size:int; ghost abs:intlist}
  /* encapsulated locations */
  boundary {capacity, pool, pool'any, pool'rep'any}
  public invariant stkPub = ∀ s: Stack ∈ pool. 0 ≤ s.size ≤ capacity
    ∧ (∀ t: Stack ∈ pool. s 6= t ⇒ s.rep ∩ t.rep ⊆ {null}) ∧ ...
  meth Cell(self: Cell) : unit ... meth getVal(self: Cell) : int ...
  meth Stack(self: Stack) : unit ensures {self ∈ pool} ...
  meth push(self: Stack, k: int) : unit
    requires {self ∈ pool ∧ self.size < capacity}
    ensures {self.abs = cons(k,old(self.abs)) ∧ ...}
    /* allowed heap effects of implementations */
    effects {rw {self}'any, self.rep'any, alloc; rd self,capacity}
  meth pop(self: Stack) : Cell
    requires {self ∈ pool ∧ self.size > 0}
    ensures {self.size = old(self.size)-1}
    ensures {result.val = hd(self.abs) ∧ self.abs = tl(old(self.abs))}
```
Fig. 1: WhyRel interface for the Stack ADT

location is either a variable or a heap location o.f, where o is an object reference and f is its field. In WhyRel, module boundaries are specified in interfaces and clients are enforced to not directly read or write locations described by the boundary except through the use of module methods. For our stack example, the dynamic boundary is capacity, pool, pool'any, pool'rep'any; expressed using image expressions and the any datagroup. Given a region G and a field f of class type, the image expression G'f denotes the region containing the locations o.f of all non-null references o in G, where f is a valid field of o. If f is of type region, G'f is the union of the collection of reference sets o.f for all o in G. For f of primitive type, such as int or intlist, G'f is the empty region. The datagroup any is used to abstract from concrete field names: the expression pool'any is syntactic sugar for pool'val,. . . ,pool'abs. Intuitively, the dynamic boundary in Fig. 1 says that clients may not directly read or write capacity, pool, any fields of objects in pool, and any fields of objects in the rep of any Stack in pool.

While encapsulation is specified at the level of modules, separation or locality at finer granularities can be specified using module invariants. The stack interface defines a public invariant stkPub which asserts that the rep fields of all Stack objects in pool are disjoint. This idiom can be used to ensure that modifying one object has no effect on any locations in the representation of another. Clients can rely on public invariants during verification, but modules implementing the interface must ensure they are preserved by module methods. Additionally, modules may define private invariants that capture conditions on internal state; provided these refer only to encapsulated locations, i.e., the designated boundary frames these invariants, clients are exempt from reasoning about them [14].

```
module Client =
  meth prog (n: int) : int
    requires { 0 ≤ n < capacity ∧ ... }
    effects { rw alloc, pool, pool'any, pool'rep'any; rd n, capacity }
  = var i: int in var c: Cell in
    var stk: Stack in stk := new Stack; Stack(stk);
    while (i < n) do push(stk,i); i:=i+1 done; i := 0;
    while (i < n) do c:=pop(stk); result:=result+getVal(c); i:=i+1 done;
  meth prog (n: int|n: int) : (int|int) /* Relational spec for prog */
    requires { n ¨= n ∧ Both(0 ≤ n < capacity ∧ ... ) }
    ensures { result ¨= result }
```
Fig. 2: Example client for STACK and relational spec for equivalence

Finally, the STACK interface defines specs for initializers (methods Cell and Stack) and public specs for client-visible methods getVal, push, and pop. Notice that the stack initializer ensures self is added to the boundary (through post self ∈ pool) and stack operations require self to be part of the boundary (through pre self ∈ pool). Specs for push and pop are standard, using "old" expressions to precisely capture field updates. WhyRel's assertion language is first-order and includes constructs such as the points-to assertion x.f = e and operations on regions such as subset and membership. In addition to preand post-conditions, each method is annotated with a frame condition in an effects clause that serves to constrain heap effects of implementations. Allowable effects are expressed using read/write (rw) or read (rd) of locations or location sets, described by regions. For example, the effects clause for push says that implementations may read/write any field of self and any field of any objects in self.rep. The distinguished variable alloc is used to indicate that push may dynamically allocate objects.

In our development, we build two modules that implement the interface in Fig. 1: one using arrays, ArrayStack and another using linked-lists, ListStack. Both rely on private invariants on encapsulated state that capture constraints on their pointer representations and its relation to abs, the mathematical abstraction of stack objects. The private invariant of ListStack, for example, says that Cell values in the linked-list of any Stack in pool are in correspondence with values stored in abs.

Example client, equivalence spec, and verification. We now turn attention to an example client, prog, shown in Fig. 2. This program computes the sum Σ<sup>n</sup> <sup>i</sup>=0i, albeit in a roundabout fashion, using a stack. The frame condition of prog mentions the boundary for STACK, but this is fine since the client respects WhyRel's encapsulation discipline, modifying encapsulated locations solely through calls to methods declared in the STACK interface. For this client, our goal is to establish equivalence when linked against either implementations of STACK. Let the left program be the client linked against ArrayStack, and the right the client linked against ListStackEquivalence is expressed using the relational spec shown in Fig. 2. For brevity, we omit frame conditions when describing relational specs.

```
meth prog (n: int | n: int) : (int | int)
= var i: int | i: int in var c: Cell | c: Cell in
  var stk: Stack in b stk := new Stack c; b Stack(stk) c;
  while (i < n) | (i < n) do b push(stk,i) c; b i:=i+1 c done; b i:=0 c;
  while (i < n) | (i < n) do b c:=pop(stk) c;
    b result:=result+getVal(c) c; b i:=i+1 c done;
```
Fig. 3: Alignment for example stack client

This relational spec relates two versions of prog; the notation (n:int | n:int) is used to declare that both versions expect n as argument. The pre-relation requires equality of inputs: n ¨= n says that the value of n on the left is equal to the value of n on the right. We use ( ¨=), instead of (=) to distinguish between values on the left and the right<sup>4</sup> . The relational spec requires the two states being related to satisfy the unary precondition for the client, as indicated by Both(...). The post-relation, result ¨= result, asserts equality on returned values. In WhyRel, relational specs capture a ∀∀ termination-insensitive property: terminating executions of the programs being related, when started in states related by the pre-relation, will result in states related by the post-relation.

WhyRel supports two approaches to verifying relational properties. The first reduces to proving functional properties of the programs involved. For instance, equivalence of the client when linked against the two stack implementations is immediate if we prove that prog indeed computes the sum of the first n nonnegative integers.

However, this approach neither lends well to more complicated programs and relational properties, nor does it allow us to exploit similarities between related programs or reason modularly using relational specs. The alternative is to prove the relational property using a convenient alignment of the two programs. Alignments are represented syntactically in WhyRel using biprograms which pair points of interest between two programs so that their effects can be reasoned about in tandem. If the chosen alignment is adequate in the sense of capturing all pairs of executions of the related programs, relational properties of the alignment entail the corresponding relation between the underlying programs.

The biprogram for prog is shown in Fig. 3. The alignment it captures is maximal: every control point in one version of the client is paired with itself in the other version. The construct (C|C 0 ) pairs a command C on the left with a command C <sup>0</sup> on the right, and the sync form bCc is syntactic sugar for (C|C); e.g., the biprogram for prog aligns the two allocations using bstk := new Stackc. Further, this biprogram aligns both loops in lockstep, indicated using the syntax while e|e' do ... done. This alignment pairs a loop iteration on the left with a loop iteration on the right and requires the loop guards be in agreement: here, that i < n on the left is true just when i < n on the right is. Calls to stack operations are aligned in the loop body using the sync construct to facilitate

<sup>4</sup> Note in particular that x ¨= y is not the same as y ¨= x

```
bimodule REL_STACK (ArrayStack | ListStack) =
  coupling stackCoupling = ∀ s: Stack ∈ pool | s: Stack ∈ pool.
    s ¨= s ⇒ s.abs ¨= s.abs ∧ ...
  meth Stack(self: Stack | self: Stack) : (unit | unit)
    ensures {self ¨= self ∧ ...} = /* biprogram for Stack */
  meth push(self: Stack | self: Stack) : (unit | unit)
    requires {self ¨= self ∧ ... }
    ensures {self.abs ¨= self.abs ∧ ... } = /* biprogram for push */
  meth pop(self:Stack | self:Stack) : (Cell | Cell)
    requires {self ¨= self ∧ Both (self ∈ pool) ∧ Both (self.size > 0)}
    ensures {... ∧ result.val ¨= result.val} = /* biprogram for pop */
```
Fig. 4: Bimodule for Stack; excerpts

modular verification of relational properties by indicating that relational specs for push and pop are to be used.

To prove the spec (in Fig. 2) about the biprogram in Fig. 3 we reason as follows: after allocation stk on both sides is initialized to be the empty stack. The first lockstep aligned loop which pushes integers from 0, . . . , n maintains as invariant equality on i and on the mathematical abstractions the two stacks represent, i.e., i ¨= i ∧ stk.abs ¨= stk.abs. The second lockstep aligned loop which pops the stacks and increments result maintains as invariant agreement on the stack abstractions and result, the key conjunct being result ¨= result. This is sufficient to establish the desired post-relation. Importantly, the loop invariants are simple to prove—they only contain equalities between variables—and we don't have to reason about the exact contents of the two stacks involved.

Relational specs for Stack and verification. The reasoning described above relies on knowing the method implementations in ArrayStack and ListStack are equivalent. We need relational specs for push which state that given related inputs, the contents represented by the two stacks are the same; and for pop, which state that given related inputs, the values of the returned Cells are the same.

Fig. 4 shows a bimodule, REL\_STACK, relating the two implementations of STACK. It includes relational specs for the stack operations along with biprograms used for verification. The bimodule maintains a coupling relation which relates data representations used by the two stack implementations. Concretely, the coupling here states that related stacks in pool represent the same abstraction. Note that quantifiers in relation formulas bind pairs of variables; and the equality s ¨= s in stackCoupling is not strict pointer equality, but indicates correspondence. Strict pointer equality is too strong as it would not allow for modeling allocation as a nondeterministic operation or permit differing allocation patterns between programs being related. Behind the scenes, WhyRel maintains a partial bijection π between allocated references in the two states being related. The relation x ¨= y, where x and y are pointers, states that x in the left state is in correspondence with y in the right state w.r.t π, i.e., π(x) = y.

The relational spec for the initializer Stack ensures self ¨= self, which is required in the specs for push and pop. Like other invariants, coupling relations

```
meth mult(n: int, m: int) =
  i := 0;
  while (i < n) do j:=0;
    while (j < m) do
      result := result+1; j := j+1
    done; i := i+1 done;
                                           meth mult(n:int, m:int) =
                                             i := 0;
                                             while (i < n) do
                                               result := result+m;
                                               i := i+1
                                             done;
```
Fig. 5: Two versions of a simple multiplication routine

are meant to be framed by the boundary and are required to be preserved by module methods being related. Encapsulation allows for coupling relations to be hidden so that clients are exempt from reasoning about them.

The steps taken to complete the Stack development and verify equivalence of two versions of its client are as follows: (i) build the STACK interface in WhyRel, with public invariants clients can rely on and a boundary that designates encapsulated locations; (ii) develop two modules refining this interface, ArrayStack and ListStack, and verify that their implementations conform to STACK interface specs, relying on any private invariants that capture conditions on encapsulated state; (iii) provide a bimodule relating the two stack modules and prove equivalence of stack operations, relying on a coupling relation that captures relationships between pointer structures used by the two modules; (iv) verify the client with respect to specs given in STACK and prove it respects WhyRel's encapsulation regime; and finally (v) develop a bimodule for the client and verify equivalence using relational specs for stack methods.

### 3 Patterns of alignment

Well chosen alignments help decompose relational verification, allowing for the use of simple relational assertions and loop invariants. In this section, we'll look at examples of biprograms that capture alignments that aren't maximal, unlike the STACK client example in Sec. 2. We don't formalize the syntax of biprograms here, but we show representative examples. When discussing examples, we'll omit frame conditions and other aspects orthogonal to alignment.

Differing control structures. Churchill et al. [8] develop a technique for proving equivalence of programs using state-dependent alignments of program traces. They identify a challenging problem for equivalence checking, shown in Fig. 5, which compares two procedures for multiplication with different control flow. For automated approaches to relational verification, their example is challenging because of the need to align an unbounded number m of loop iterations on the left with a single iteration on the right.

To prove equivalence, we verify the biprogram shown in Fig. 6 with respect to a relational spec with pre-relation n ¨= n ∧ m ¨= m and post-relation result ¨= result; i.e., agreement on inputs results in agreement of outputs. Unlike the stack client biprogram shown in Fig. 3, the alignment embodied here is not maximal—indeed, such alignment would not be possible due to the differing

```
meth mult(n: int, m: int | n: int, m: int) : (int | int) =
  b i := 0 c;
  while (i < n) | (i < n) do invariant { i ¨= i ∧ result ¨= result }
    ( j := 0; while (j < m) do result := result+1; j := j+1 done
    | result := result+m );
    assert { h[result = old(result)+mh] };
    b i := i+1 c done;
```
Fig. 6: Biprogram for example in Fig. 5

```
meth sumpub (l: List) : int =
  p:=l.head; s:=0;
  while (p 6= null) do
    if p.pub then
      s:=s+p.val
    end;
    p:=p.nxt
  done;
  result:=s;
                                    meth sumpub (l: List | l: List) : int =
                                      b p:=l.head c; b s:=0 c;
                                      while (p 6= null) | (p 6= null) .
                                        h[ ¬ p.pub h] | [i ¬ p.pub ]i do
                                        ( if p.pub then s:=s+p.val end;
                                          p:=p.nxt
                                        | if p.pub then s:=s+p.val end;
                                          p:=p.nxt)
                                      done; b result:=s c;
```
Fig. 7: Summing up public elements of a linked list: program and alignment

control structure. Similarities are still exploited by aligning the outer loops in lockstep and the left inner loop with the assignment to result on the right.

A simple relational loop invariant which asserts agreement on i and result is sufficient for proving equivalence. To show this is invariant, we need to establish that the inner loop on the left has the effect of incrementing result by m, thereby maintaining equality on result after the inner loop. In Fig. 6 this is indicated by the assertion after the left inner loop. The notation h[Ph] (resp. [iP]i) is used to state that the unary formula P holds in the left (and resp. right) state.

Conditionally aligned loops. Examples so far have concerned lockstep aligned loops, requiring a one-to-one correspondence between loop iterations. However, this condition is often too restrictive. WhyRel provides for other patterns of loop alignment, including those that account for conditions on data values. Consider for example the program shown in Fig. 7 which traverses a linked list and computes the sum of all elements marked public, indicated in each element's pub field. The program satisfies the following noninterference property, with relational spec:

```
meth sumpub(l: List | l: List) : (int | int)
  requires { Both(listpub(l,xs)) ∧ xs ¨= xs }
  ensures { result ¨= result }
```
Here listpub(l,xs) is a predicate which asserts that the sequence of public values reachable from the list pointer l is realized in xs, a mathematical list of integers. Intuitively, this specification captures the property that the result of sumpub does not depend on the values of nonpublic elements in the input list l. Showing the program computes exactly the sum of public elements: result = sum(xs) would imply the desired noninterference property. However, to showcase support

WhyRel offers for non-lockstep alignments, we'll establish noninterference by conditionally aligning the loops in the two copies of sumpub (see Fig. 7).

The alignment is as follows: if p is a nonpublic node on one side, perform a loop iteration on that side, pausing the iteration on the other; and if p on both sides is public, perform lockstep iterations of both loops. This has the effect of incrementing s exactly when both sides are visiting public nodes, the values of which are guaranteed to be the same by the relational precondition. The biprogram expresses this alignment through the use of additional annotations, called alignment guards which are general relation formulas and express conditions that lead to left-only, right-only, or lockstep iterations. The left alignment guard h[¬ p.pubh] indicates that left-only loop iterations are to be performed when p on the left is not public. The right alignment guard expresses a similar condition when p on the right is not public. Iterations proceed in lockstep when both alignment guards are false, i.e., when Both(p.pub) is true.

This biprogram maintains ∃ xs|xs. Both(listpub(p,xs)) ∧ xs ¨=xs ∧ s ¨=s as loop invariant, which implies the desired post-relation. This invariant states that p on both sides points to the same sequence of public values as captured by listpub(p,xs) and that there is agreement on the sum s computed so far. During verification, we must establish that left-only, right-only, and lockstep iterations of the aligned loops preserve this invariant. Due to the alignment, the value of s is only updated during lockstep iterations and its straightforward to show preservation. For one-sided iterations, reasoning relies on knowing that the sequence of public values pointed to by p remains the same.

#### 4 Encoding and design

We implement WhyRel in OCaml, relying on a library provided by Why3 for constructing WhyML parse trees. Source programs are parsed and typechecked before being translated to WhyML. Prior to translation, WhyRel performs a variety of checks and transformations: primary among these is a check that clients respect encapsulation and that any biprograms provided by users are adequate. Proof obligations pertinent to relational region logic are generated in the form of intermediate assertions in WhyML programs and lemmas for the user to prove. In this section, we provide an overview of some aspects of our implementation, focusing on the translation to WhyML.

Encoding program states. References are represented using an abstract WhyML type reference with a distinguished element, null. The only operation supported on reference values is equality; WhyRel does not deal with pointer-arithmetic. Regions are encoded as ghost state, using a library for mathematical sets provided by Why3. Set operations on regions are inherently supported, and we axiomatize image expressions: for each field f, WhyRel generates a Why3 function symbol img\_f along with an axiom that captures the meaning of G'f.

Program states are encoded using WhyML records. An example is shown in Fig. 8. The state type includes at least two mutable components called alloct

```
/* class defs */
class Cell {
  val: int;
  ghost rep: rgn; }
class Node {
  curr: Cell;
  nxt: Node; }
/* global vars */
public pool : rgn
                               type reftype = Cell | Node (*class names*)
                               type heap = {
                                 mutable val: map reference int;
                                 mutable ghost rep : map reference Rgn.t;
                                 mutable curr: map reference reference;
                                 mutable nxt: map reference reference }
                               type state = {
                                 mutable alloct: map reference reftype;
                                 mutable heap: heap;
                                 mutable ghost pool: rgn }
                               invariant {¬(Map.mem null alloct) ∧ ...}
                               (* axiomatization of G'nxt *)
                               function img_nxt : state → Rgn.t → Rgn.t
                               axiom img_nxt_ax : ∀ s, r, p.
                                 Rgn.mem p (img_nxt s r) ⇔ ∃ q.
                                     s.alloct[q] = Node ∧ Rgn.mem q r
                                   ∧ p = s.head.nxt[q]
```
Fig. 8: State encoding: WhyRel source on left, encoding in WhyML on right.

and heap. The component alloct stores a map from references to object types and keeps track of allocated objects; heap is itself a record with one mutable component per field in the source program that stores a map from references to values. The set of values includes references, Why3 mathematical types such as arrays and lists, regions, and primitive types such as int and bool. In addition, the state type contains one mutable field per global variable in the source program, storing a value of the appropriate type. The state type is annotated with a WhyML invariant that captures well-formedness. This invariant includes conditions such as null never being allocated, no dangling references, and typing constraints: for example, the nxt field of a Node is itself a Node.

Translating unary programs and effects. WhyRel translates unary programs into WhyML functions that act on our encoding of states. Commands that modify the heap are modeled as updates to an explicit state parameter, and local variables, parameters, and the distinguished result variable are encoded using WhyML reference cells. Object parameters are modeled using the reference type and a typing assumption. Translation of control flow statements is straightforward. For programs with loops, WhyRel additionally adds a diverges clause to the generated WhyML function: this indicates that the function may potentially diverge, avoiding generation of VCs for proving termination. While Why3 supports reasoning about total correctness, we're only concerned with partial correctness. Fig. 9 shows an example translation.

Translation of frame conditions requires care given our encoding of states. As an example, the writes for method m shown in Fig. 9 would include rw {c}'val due to the write to, and read of, field val of object c. Correspondingly, in the Why3 translation, component val of s.heap is updated; so specifying the function in Why3 requires adding writes {s.heap.val} as annotation. However, this isn't the granularity we want since it implies the field val of any reference can be

```
meth m (c: Cell, i: int) : int
  requires { c.val ≥ 0 }
= while (i ≥ 0) do
    invariant { c.val ≥ 0 }
    c.val := c.val+i;
    i := i-1
  done;
  result := c.val
                                   let m (s:state) (c:reference) (i:int)
                                     : int diverges
                                     requires { s.alloct[c] = Cell }
                                     requires { s.heap.val[c] ≥ 0 }
                                   = let result = ref 0 in
                                     let c = ref c in
                                     let i = ref i in
                                     while (!i ≥ 0) do
                                       invariant { s.heap.val[!c] ≥ 0 }
                                       (* c.val := c.val + i *)
                                       s.heap.val ← Map.add !c
                                         (s.heap.val[!c]+!i) s.heap.val;
                                       i := !i-1
                                     done;
                                     result := s.heap.val[!c]; !result
```
Fig. 9: Program translation example: WhyRel program on the left, WhyML translation on the right; frame conditions omitted.

written. Hence, WhyRel generates an additional postcondition for method m: wr\_framed\_val (old s) s (Rgn.singleton c), where

predicate wr\_framed\_val (s: state) (t: state) (r: rgn) = ∀ p: reference. s.alloct[p] = Cell ∧ p ∈/ r ⇒ s.heap.val[p] = t.heap.val[p]

With this postcondition, callers of m (in WhyML) can rely on knowing that the val fields of only references in {c} are modified.

Biprograms. WhyRel translates biprograms into product programs; specifically, WhyML functions that act on a pair of states<sup>5</sup> . Before translation, it performs an adequacy check to ensure the biprogram is well-formed. Recall that adequacy here means that all computations of the underlying unary programs are covered by their aligned biprogram. Adequacy ensures that a relational judgment about the biprogram entails the expected relation between the underlying unary programs. The check WhyRel performs is syntactic and defined using projection operations on biprograms. Given a biprogram CC, the left projection (<sup>−</sup> CC (and resp. the right projection <sup>−</sup>\*CC) extracts the unary program on the left (and resp. the right). As an example, the left projection of bc.f:=gc; (x:=c.f | skip) is c.f:=g; x:=c.f and its right projection is c.f:=g. For adequacy, given unary programs C and C <sup>0</sup> and their aligned biprogram CC, it suffices to check whether (<sup>−</sup> CC ≡ C and <sup>−</sup>\*CC <sup>≡</sup> <sup>C</sup> 0 [1].

Translation of biprograms is described in Fig. 10. The translation function B takes a biprogram and a pair of contexts (Γ<sup>l</sup> , Γr) to a WhyML program. In addition to mapping WhyRel identifiers to WhyML identifiers, contexts store information about the state parameters on which the generated WhyML program

<sup>5</sup> In reality, generated WhyML functions act on a pair of states and a bijective renaming of references allocated in these states. This is to cater for relation formulas such as x ¨= y where x and y are references. However, this additional parameter is not important to our discussion here, so we avoid mentioning it.

Fig. 10: Translation of biprograms, excerpts

acts. Similar to B, the function U translates unary programs to WhyML programs, E, expressions to WhyML expressions, and F, a restricted set of relation formulas to WhyML expressions. Biprograms don't require the underlying unary programs to act on a disjoint set of variables; however, this means that WhyRel has to perform appropriate renaming during translation. Renaming is manifest in the translation of variable blocks (var x:T|x:T' in CC), where the context Γ<sup>l</sup> (and resp. Γr) is extended, [Γ<sup>l</sup> | x : x<sup>l</sup> ], mapping x to a renamed copy x<sup>l</sup> (and resp. Γ<sup>r</sup> is extended with the binding x : xr).

In translating (C|C 0 ), the unary translations of C and C <sup>0</sup> are sequentially composed. Syncs bCc are handled similarly, as syntactic sugar for (C|C), except for the case of method calls. Procedure-modular reasoning about relational properties is enabled by aligning method calls which indicates that the relational spec associated with the method is to be exploited. WhyRel will translate these to calls to the appropriate WhyML product program, using a global method context (Φ in Fig. 10). Since translated product programs act on pairs of states, the generated WhyML call takes Γ<sup>l</sup> .st and Γr.st, names for left and right state parameters, as additional arguments.

Product constructions for control flow statements require generating additional proof obligations. For aligned conditionals, WhyRel introduces an assertion that the guards are in agreement. Lockstep aligned loops are dealt with similarly; guard agreement must be invariant. For conditionally aligned loops, the generated loop body captures the pattern indicated by the alignment guards P |P<sup>0</sup> : if the left (resp. right) guard is true and P (resp. P 0 ) holds, perform a left-only (resp. right-only) iteration; otherwise, perform a lockstep iteration. Adequacy is ensured

by requiring the condition A to be invariant. This condition states that until both sides terminate, the loop can perform a lockstep or a one-sided iteration. In relational region logic, the alignment guards P and P 0 can be any relational formula. However, the encoding of conditionally aligned loops is in terms of a conditional that branches on these alignment guards. In Why3, this only works if P and P <sup>0</sup> are restricted; for example, to not contain quantifiers. WhyRel supports alignment guards that include agreement formulas, one-sided points-to assertions, one-sided boolean expressions, and the usual boolean connectives.

Proof obligations for encapsulation. To ensure sound encapsulation, WhyRel performs an analysis on source programs. This analysis includes two parts: a static check to ensure client programs don't directly write to variables in a module's boundary; and the generation of intermediate assertions that express disjointness between the footprints of client heap updates and regions demarcated by module boundaries. For modules with public/private invariants, WhyRel additionally generates a lemma which states that the module's boundary frames the invariant, i.e., the invariant only depends on locations expressed by the boundary. The same is done with coupling relations, for which we need to consider boundaries of both modules being related. A technical condition of relational region logic requiring boundaries grow monotonically as computation proceeds is also ensured by introducing appropriate postconditions in generated programs.

### 5 Evaluation

We evaluate WhyRel via a series a case studies, representative of the challenge problems highlighted at the outset of this article. Examples include representation independence, optimizations such as loop tiling [5], and others from recent literature on relational verification (including [9] and [21]). Some, like those described in Sec. 3, deal with reasoning in terms of varying alignments including data-dependent ones. Our representation independence examples include showing equivalence of Dijkstra's single-source shortest-paths algorithm linked against two implementations of priority queues, which requires reasoning about finegrained couplings between pointer structures; and Kruskal's minimum spanning tree algorithm linked against different modules implementing union-find, which requires couplings equating the partitions represented by the two versions. For all examples, VCs are discharged using the SMT solvers Alt-Ergo, CVC4, and Z3. Replaying proofs of most developments using Why3's saved sessions feature takes less than 30 minutes on a machine with an Intel Core i5-6500 processor and 32 gigabytes of RAM.

A primary goal of this work is to investigate whether verifying relational properties of heap manipulating programs can be performed in a manner tractable to SMT-based automation, and for the most part, we believe WhyRel provides a promising answer. The tool serves as an implementation of relational region logic and demonstrates that even its additional proof obligations for encapsulation can be encoded using first-order assertions. In fact, exploration of case studies using WhyRel was instrumental in designing proof rules of relational region logic.

Reasoning about heap effects à la region logic is generally simple and VCs get discharged quickly using SMT. However, technical lemmas WhyRel generates which pertain to showing that module boundaries frame private invariants and couplings require considerable manual effort to prove. These lemmas usually involve reasoning about image expressions, which involve existentials and nontrivial set operations on regions. Given our encoding of states and regions, SMT solvers seem to have difficulties solving these goals. Manual effort involves applying a series of Why3 transformations (or proof tactics) and introducing intermediate assertions. We conjecture that the issue can be mitigated by using specialized solvers [23] or different heap encodings [24].

Another issue with our encoding of typed program states is the generation of a large number of VCs related to well-formedness of states. These account for a substantial fraction of proof replay time. Why3 programs act directly on our minimally-typed state representation and each heap update needs to preserve an invariant that specifies constraints on the types of allocated references (see Fig. 8). Using Why3's support for module abstraction [12] may ameliorate this issue. An alternative is to use assumptions, which can be justified by correctness of the WhyRel type checker and translator.<sup>6</sup>

Apart from these challenges related to verification, we note that specs in region logic tend to be verbose when compared to other formalisms such as separation logic [4].

#### 6 Related work

WhyRel is closely modeled on relational region logic, developed in [1]. That paper provides a high-level overview of WhyRel, using a small set of examples verified in the tool to motivate aspects of the formal logic; but it doesn't give a full presentation of the tool or go into details about the encoding. The paper provides comprehensive soundness proofs of the logic and shows how the VCs WhyRel generates and the checks it performs correspond closely to obligations of relational proof rules. The paper builds on a line of work on region logic [4,2,3]. The VERL tool implements an early version of unary region logic without encapsulation and was used to evaluate a decision procedure for regions [23].

For local reasoning about pointer programs, separation logic is an effective and elegant formalism. For relational verification, ReLoC [13], based on the Iris separation logic and built in the Coq proof assistant supports, apart from many others, language features such as dynamic allocation and concurrency. However, we are unaware of auto-active relational verifiers based on separation logic.

Alignments for relational verification have been explored in various contexts. In WhyRel, the biprogram syntax captures alignment based on control flow, but also caters to data-dependent alignment of loops through the use of alignment guards (as discussed in Sec. 3). Churchill et al. [8] develop a technique for equivalence checking by using data dependent alignments represented by control

<sup>6</sup> The Boogie verification language provides "free requires" and "free ensures" syntax for just such assumptions.

flow automata which they use to prove correctness of a benchmark of vectorizing compiler transformations and hand-optimized code. Unno et al. [30] address a wide range of relational problems including k-safety and co-termination, expressing alignments and invariants as constraint satisfaction problems they solve using a CEGIS-like technique. Their work is applied to benchmarks proposed by Shemer et al. [25] who develop a technique for equivalence and regression verification. Both the above works represent alignments as transition systems and perform inference of relational invariants and alignment conditions. Inference relies on solvers and therefore programs need to be restricted so they are amenable to these solvers. A promising approach by Barthe et al. [6] reduces relational verification to proving formulas in trace logic, a multi-sorted first-order logic using firstorder provers. In trace logic, conditions can be expressed on traces including relationships between different time points without recourse to alignment per se.

Sousa and Dillig develop Descartes [26] for reasoning about k-safety properties of Java programs automatically using implicit product constructions and in a logic they term Cartesian Hoare logic. Their work is furthered by Pick et al. [22] who develop novel techniques for detecting alignments. The REFINITY [27] workbench based on the interactive KeY tool can be used to reason about transformations of Java programs; heap reasoning relies on dynamic frames and relational verification proceeds by considering abstract programs. Other related tools include SymDiff [18] which is based on Boogie and can modularly reason about program differences in a language-agnostic way, and LLRêve [16] for regression verification of C programs. Eilers et al. [10] develop an encoding of product programs for noninterference that facilitates procedure-modular reasoning. They verify a large collection of benchmark examples using the VIPER toolchain.

### 7 Conclusion

In this paper we present WhyRel, a prototype for relational verification of pointer programs that supports dynamic framing and state-based encapsulation. The tool faithfully implements relational region logic and demonstrates how its proof obligations, including those related to encapsulation, can be encoded in a firstorder setting. We've performed a number of representative examples in WhyRel leveraging support Why3 provides for SMT, and believe these demonstrate the amenability of region logic, and its relational variant, to automation.

Acknowledgments We thank the anonymous TACAS reviewers and artifact evaluators for their thorough feedback and suggestions which have led to major improvements in this paper. We thank Seyed Mohammad Nikouei who built an initial version of WhyRel which helped guide the design of the current version. Nagasamudram and Naumann were partially supported by NSF award 1718713. Banerjee's research was based on work supported by the NSF, while working at the Foundation. Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the NSF.

Data Availability Statement Sources for WhyRel and all examples performed using the tool are available in Zenodo with the identifier https://doi.org/10. 5281/zenodo.7308342 [20].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Bridging Hardware and Software Analysis with Btor2C: A Word-Level-Circuit-to-C Translator

Dirk Beyer , Po-Chun Chien , and Nian-Ze Lee

LMU Munich, Munich, Germany

Abstract. Across the broad research field concerned with the analysis of computational systems, research endeavors are often categorized by the respective models under investigation. Algorithms and tools are usually developed for a specific model, hindering their applications to similar problems originating from other computational systems. A prominent example of such a situation is the area of formal verification and testing for hardware and software systems. The two research communities share common theoretical foundations and solving methods, including satisfiability, interpolation, and abstraction refinement. Nevertheless, it is often demanding for one community to benefit from the advancements of the other, as analyzers typically assume a particular input format. To bridge the gap between the hardware and software analysis, we propose Btor2C, a translator from word-level sequential circuits to C programs. We choose the Btor2 language as the input format for its simplicity and bit-precise semantics. It can be deemed as an intermediate representation tailored for analysis. Given a Btor2 circuit, Btor2C generates a behaviorally equivalent program in the language C, supported by many static program analyzers. We demonstrate the use cases of Btor2C by translating the benchmark set from the Hardware Model Checking Competitions into C programs and analyze them by tools from the Intl. Competitions on Software Verification and Testing. Our results show that software analyzers can complement hardware verifiers for enhanced quality assurance: For example, the software verifier VeriAbs with Btor2C as preprocessor found more bugs than the best hardware verifiers ABC and AVR in our experiment.

Keywords: Hardware compilation · Word-level circuit · Intermediate representation · Formal verification · Testing · Btor2 · SMT · SAT

### 1 Introduction

Computational systems have become more and more ubiquitous in our daily life and manifest themselves in various contexts, including VLSI circuits, software programs, and cyber-physical systems. To construct reliable systems, quality assurance has become an indispensable research topic. Numerous endeavors have been invested for different computational systems. Because of the everincreasing system complexity and applications in safety-critical missions, it is of vital importance to take advantage of all available solutions for different types of systems to guarantee the quality and correctness.

Formal verification and testing are two active fields of research to analyze and assure the quality of computational systems. The former decides with mathematical rigorousness whether a system conforms to a specification. The latter aims at generating input patterns and executing a system on a test suite to observe irregular output responses. Studies for formal verification or testing usually focus on a specific computational model, especially a sequential circuit (hardware) or a program (software). Tool competitions are also established based on modeling languages for input instances, such as the language Btor2 [64] used in the Hardware Model Checking Competitions (HWMCC) [28, 29], or the language C assumed by the Competitions on Software Verification (SV-COMP) [11, 14] and Testing (Test-Comp) [12, 13]. Unfortunately, such distinction erects a barrier between the two closely related research communities.

#### 1.1 Our Motivations and Contributions

For the hardware community to easily benefit from state-of-the-art softwareanalysis techniques, we aim at developing a lightweight yet effective translation flow to bridge the gap between hardware and software analysis. There have been several attempts [48, 62] to compile hardware designs into software, mostly using the language Verilog as the input format. Verilog is a general-purpose hardware description language, and thus, a comprehensive frontend for Verilog requires tremendous engineering effort. Moreover, Verilog has rather complicated syntax and semantics, which might increase the burden on the translation flow.

To address the complexity in the frontend design, we resort to the language Btor2 [64], proposed recently to model word-level sequential circuits. A suite Btor2Tools [63] of utility tools is also provided for conveniently parsing, simulating, and bit-blasting (to the bit-level format Aiger [26]) Btor2 circuits. We emphasize the following two benefits of using Btor2 as the translation frontend over Verilog. First, Btor2 provides simple yet sufficient operations over bit-vectors and arrays. The simplicity makes it an appropriate intermediate representation for formal verification and testing, as the operations are suitable for the underlying satisfiability solvers. Second, Btor2 is the input format used in the HWMCC. Many hardware model checkers support this format, and a large collection of benchmarking tasks is available for empirical evaluation. In practice, a Verilog circuit can be translated to Btor2 via Yosys [70], an open-source Verilog synthesis tool. Therefore, using Btor2 as frontend does not restrict the applicability of the translation flow.

Having settled down the frontend choice, our next question is: Should we make software analyzers support Btor2, or should we implement a standalone translator that does the job for all tools? We take the latter approach such that any software analyzer (from 76 available [25]) can in principle be used for hardware analysis. As opposed to using Verilog as frontend, the simplicity of the Btor2 language helps to generate C programs suitable for the backend analysis, as will be shown in Sect. 5 via comparison with the Verilog-to-C translator v2c [62].

Once a handy translator is viable, we are enthusiastic about empirically comparing hardware and software analyzers on a large scale. Similar experiments have been carried out for bounded [60] and unbounded [61] formal verification on a

Fig. 1: Software analysis made readily available for hardware designs

small set of circuits. By building a translator on top of the Btor2 language, more than a thousand benchmarking tasks from the HWMCC are at our immediate disposal. To draw a more reliable conclusion on the performance comparison of state-of-the-art hardware and software analyzers, we evaluate bit-level and wordlevel hardware model checkers from HWMCC, software verifiers from SV-COMP, and software testers from Test-Comp, on the HWMCC benchmark set.

Our contributions in this paper are summarized below:

Novelty. (1) To bridge the gap between hardware and software analysis, we design and implement Btor2C, the first hardware-to-software compiler taking the format Btor2 [64] as input. Specifically, Btor2C accepts a Btor2 circuit and produces a behaviorally equivalent C program. Given a Verilog design, Btor2C (with the help of Yosys) makes off-the-shelf software verifiers and testers readily available for its analysis. In addition to bit-level and word-level analyzers, hardware developers will be equipped with more tool choices to perfect their designs, as shown in Fig. 1. (2) Btor2C makes it easy to construct new hardware analyzers by prepending the translator in front of any software analyzer. (3) Applying Btor2C to the HWMCC benchmark set, we submitted 1224 new tasks<sup>1</sup> to sv-benchmarks, the benchmark collection used by many researchers, including SV-COMP and Test-Comp. Developers of software analyzers can now assess their tools using the hardware-analysis counterparts as a new baseline.

Significance. (1) We conduct a large-scale evaluation involving hardware model checkers, software verifiers, and software testers on the HWMCC benchmark set. Our results show that software-analysis techniques can complement hardware model checkers. (2) The proposed lightweight translator makes software analyzers more accessible to the entire research community, as Btor2 can be used as an intermediate representation for analysis, not limited to hardware designs.

#### 1.2 Example

Figure 2 illustrates the proposed translator Btor2C on an example. A circuit whose state is a bit-vector of width 3 is given in Btor2 format in Fig. 2a. The

<sup>1</sup> Some tasks used in this paper were excluded due to license issues.

```
1 sort bitvec 3
2 zero 1
3 state 1
4 init 1 3 2
5 input 1
6 add 1 3 5
7 one 1
8 sub 1 6 7
9 next 1 3 8
10 ones 1
11 sort bitvec 1
12 eq 11 3 10
13 bad 12
   (a) Btor2 circuit
                           1 extern void abort(void);
                           2 extern unsigned char nondet_uchar();
                           3 void main() {
                           4 typedef unsigned char SORT_1;
                           5 typedef unsigned char SORT_11;
                           6 const SORT_1 var_2 = 0b000;
                           7 const SORT_1 var_7 = 0b001;
                           8 const SORT_1 var_10 = 0b111;
                           9 SORT_1 input_5;
                          10 SORT_1 state_3 = var_2;
                          11 for (;;) {
                          12 input_5 = nondet_uchar();
                          13 input_5 = input_5 & 0b111;
                          14 SORT_11 var_12 = state_3 == var_10;
                          15 SORT_11 bad_13 = var_12;
                          16 if (bad_13) {
                          17 ERROR: abort();
                          18 }
                          19 SORT_1 var_6 = state_3 + input_5;
                          20 var_6 = var_6 & 0b111;
                          21 SORT_1 var_8 = var_6 − var_7;
                          22 var_8 = var_8 & 0b111;
                          23 state_3 = var_8;
                          24 }
                          25 }
                          (b) C program (simplified for demo)
```
Fig. 2: An example Btor2 circuit (a) and its translated C program (b)

bit-vector is initialized to 0 (lines 2-4). In every iteration, the value of the bitvector will be incremented by the value of the external input (lines 5-6) and then decremented by 1 (lines 7-8). The circuit reaches a bad state (i.e., violates the safety property) if the value of the bit-vector equals 0b111 (lines 12-13). The translated C program is shown in Fig. 2b. Btor2C first looks for the sorts used in the input Btor2 file. In this example, bit-vectors of 3 bits and 1 bit are used, and Btor2C encodes them with the shortest possible unsigned integer type unsigned char (lines 4-5). After sort declarations, Btor2C defines constants, declares inputs, and initializes circuit states (lines 6-10). An infinite loop is created to simulate the behavior of a sequential circuit. At the beginning of the loop, the safety property is evaluated. If the property is violated (namely, variable bad\_13 evaluates to true), the program reaches the error location at line 17. Otherwise, the next-state value (stored in variable var\_8) is computed and assigned to the current state (lines 19-23), and another loop iteration follows. After the translation, we can apply software verifiers to the translated program in Fig. 2b to check whether the circuit in Fig. 2a conforms to the specified safety property.

### 2 Related Work

### 2.1 Compiling Hardware to Software

Several research efforts [48, 68] have been invested into representing a circuit as a program, whose primary goal is to accelerate hardware simulation. The most related work to ours is the Verilog-to-C translator v2c [62], used to translate hardware circuits into software programs for bounded [60] and unbounded [61] formal verification. Unlike v2c, our translator uses as frontend the Btor2 language, which

is simple to parse and suitable for analysis. In Sect. 5, we compare the performance of software analyzers on C programs generated by v2c and our tool Btor2C.

#### 2.2 Compiling Hardware to Intermediate Representation

Another line of research related to our work is the compilation of hardware to an intermediate representation that eases the burden of analysis. The motivation of these works is to interface real-world designs and problems described in a more abstract language with tools that use a primitive model representation. Our tool Btor2C shares a similar spirit because it interfaces problems in hardware analysis with software techniques. Among other tools, Verilog2SMV [51] and Ver2Smv [59] translate a Verilog circuit into SMV format [34, 56], which can be verified by tools like nuxmv [33]. QuteRTL [71] translates a register-transfer-level hardware design (usually in Verilog or VHDL) to Btor [31], an earlier version of Btor2. EBMC [55] generates SMT formulas in SMT-LIB 2 format [8], which encode the bounded model checking or k-induction problems of a Verilog circuit. Yosys [70], which translates a Verilog circuit into the Aiger or Btor2 formats, also serves the same purpose. Recently, there has been an interest to develop an intermediate language for the model-checking research community [67]. The project aims at providing an expressive frontend language as well as an efficient interface with backend model checkers.

### 3 Background

### 3.1 The Btor2 Language

Btor2 is a bit-precise modeling language for word-level sequential circuits. It can be seen as a generalization of the bit-level Aiger format [26]. The essential ingredients of Btor2 relevant to our discussion in Sect. 4 will be introduced below. For the complete syntax, please refer to the Btor2 publication [64].

Each line in a Btor2 file starts with a unique number, used by other lines to identify the entity defined in this line. Such an entity can be either a sort or a node. A sort is either a bit-vector type of an arbitrary width w, denoted by B <sup>w</sup>, or an array type. An array type whose indices and elements are bit-vector types I and E, respectively, is denoted by AI→E . A node can be an input, a state, or a result of an operator over other inputs, states, or results. Inputs are external stimuli given to the Btor2 circuit. Memory elements of the circuit are modeled by states. Usually, inputs have bit-vector types, and states can be of either bit-vector or array types.

Operators are the building blocks of a Btor2 circuit. They take arguments of the prescribed types and guarantee a specific type for the result. The general signature for a Btor2 operator is as follows: <node id> <op> <sort id0> <node id1> [<node id2 [node id3]>], which defines a node to be the computation result of the operator op on node id1 and optionally id2 and id3. The result will have type id0 and can be accessed by id. The operators in Btor2 will be introduced later in Sect. 4 alongside the translation process of Btor2C.

Btor2 also provides constructs like init, next, and bad to describe the safety-reachability problem for sequential circuits. Initial and bad states can be defined by init and bad, respectively. The transition from one state to another is captured by next. In the following, we briefly recap sequential circuits and their model-checking formulation.

#### 3.2 Sequential Circuits and Hardware Model Checking

A sequential circuit is a computational model widely used in the design and analysis of hardware. It consists of a combinational circuit and memory elements. The combinational circuit is in charge of the computation, and the memory elements store the circuit's state. The combinational circuit is a directed acyclic graph whose vertices are logic gates and edges are wires connecting the gates. If the output pin of gate u is connected to an input pin of gate v, we say that u is a fan-in of v, and v is a fan-out of u.

The computation of sequential circuits is segmented into consecutive time frames. Before the first time frame starts, the memory elements are typically reset (described by init). At the beginning of each time frame, the combinational circuit reads the values stored in the memory elements and receives stimuli from the environment. The former is called the current state of the circuit, and the latter is called the external input in this time frame. Propagating the current state and external input through its logic gates, the combinational circuit computes the output response and the new values to be stored in the memory elements (namely, next-state values, described by next). At the end of the time frame, the next-state values are saved into the memory elements, which become the current state for the next time frame.

The model-checking problem of reachability safety for hardware is formulated as follows: Given a sequential circuit and a safety property (usually encoded as an output of the sequential circuit's combinational part, described by bad), decide whether the safety property holds on all executions of the sequential circuit. If the property does not hold on some execution, a hardware model checker generates an input sequence to trigger the output, and the sequential circuit is deemed unsafe with respect to the property. Otherwise, the sequential circuit is considered safe, and a model checker might additionally generate (an overapproximation of) the set of reachable states as correctness witness.

#### 3.3 Software Model Checking

The reachability-safety problem for software is formulated similarly as hardware model checking. Given a program and a safety property (usually labeled as an error location in the program), determine whether there is an executable program path that reaches the error location. Although, unlike hardware, software model checking is in general undecidable, many research efforts have been invested into automated solutions to this problem [10, 19, 53], including predicate abstraction [5, 42, 47, 50], counterexample-guided abstraction refinement (CEGAR) [6, 36], and interpolation [49, 58]. The verification of industry-scale software such as operating-systems code [4, 7, 23, 32, 37, 54] is made feasible together by these solutions and the advances in SMT solving [9]. It is our research enthusiasm to explore how these concepts work on hardware.

### 4 Translating Btor2 to C

This section describes the proposed translator Btor2C<sup>2</sup> , implemented in the language C with approximately 1600 lines of code. We first describe the general idea of using C programs to simulate sequential circuits, whose behavior is intrinsically concurrent. The implementations of various Btor2 operators and optimizations in Btor2C are discussed later.

#### 4.1 Simulating Sequential Circuits with C Programs

Sequential circuits work in a concurrent manner: The external input and current state propagate in parallel through the combinational circuitry to produce circuit outputs and next-state values. In contrast, the C programming language is imperative, and hence C programs are generally executed line-by-line.

To capture the behavior of sequential circuits in the context of reachability safety, Btor2C generates C programs with the generic single-loop program in Fig. 3 as a template. In the generic program, the sorts and constants used in the sequential circuit are defined at the beginning of the main() function. Second, the program initializes the circuit's states. An endless loop is then used to mimic the state-transition behavior of the circuit throughout time frames: When a loop

```
void main() {
  // Define sorts and constants
  // Initialize states
  for (;;) {
    /∗ Evaluate safety property
    if (bad) {
      ERROR: abort();
    } ∗/
    // Compute and assign next states
  }
}
```
Fig. 3: A generic program to imitate sequential circuits for reachability safety

iteration begins, the safety property is evaluated over the current state and external input. If the property is violated, the program exits with an error. Otherwise, the next-state values are computed and stored into the state variables. This generic program reflects the reachability safety for sequential circuits.

The commented blocks in the generic program have to be replaced by C instructions to encode the concurrent computation of the sequential circuit. Btor2C assigns every node in the input Btor2 circuit a unique variable in the translated C program. Nodes used for state initialization, state transition, or safety properties, are specified by keywords init, next, or bad, respectively. For such a node, a backward depth-first traversal is applied to collect its transitive fan-in cone to avoid irrelevant signals regarding model checking. Multiple bad keywords in a Btor2 file are translated to multiple error labels in the C program.

#### 4.2 Variable Naming

We use the unique identification numbers for lines in a Btor2 file to name their corresponding variables in the translated C program. Suppose the unique ID of a line is n. If the line defines a sort, it is named SORT\_n in the C file. If the line defines a state or an input, it is named state\_n or input\_n, respectively. If the line defines a node used for state initialization, transition, or property evaluation,

<sup>2</sup> https://gitlab.com/sosy-lab/software/btor2c

it is named init\_n, next\_n, or bad\_n, respectively, to honor the keywords init, next, or bad. For the rest of the nodes, we name their variables var\_n in the C file.

### 4.3 Expressing Btor2 Sorts in C

The language Btor2 supports two sorts: bit-vectors and arrays. Whenever possible, Btor2C represents a bit-vector type B <sup>w</sup> by the shortest unsigned-integer type whose number of bits is greater than or equal to w. For example, a B 3 type with sort ID n is encoded by typedef SORT\_n unsigned char;, and a B <sup>20</sup> type with sort ID m is encoded by typedef SORT\_m unsigned int;. A Btor2 bit-vector type can have an arbitrary width. If a Btor2 circuit uses a bit-vector type longer than 64 bits, Btor2C cannot translate it to a C program, because no C type can accommodate the bit-vector<sup>3</sup> . The missing capability to handle bit-vectors longer than 64 bits is a restriction of Btor2C, but the sacrifice is worthy: By encoding bit-vectors with integer variables, native C operators can be directly applied to implement Btor2 operators, which greatly simplify the analysis of translated programs. As can be seen in Sect. 5, the state-of-the-art software verifiers and testers have a decent performance on the translated programs. In practice, only 20 % of the collected Btor2 benchmarking circuits have bit-vectors longer than 64 bits, so we consider the restriction acceptable.

For Btor2 arrays, Btor2C represents them by static arrays. Suppose the sort ID for an array type AI→E is n. Let its index type I be B <sup>w</sup> and element type E be encoded by SORT\_m. Then AI→E is encoded by the following C instruction: typedef SORT\_m SORT\_n[1 << w];, which means SORT\_n is an array with 2 w objects of type SORT\_m.

### 4.4 Implementing Btor2 Operators in C

The language Btor2 provides various operations, most of which can be easily implemented by the corresponding C operators. Recall that we extend to the next unsigned-integer type to encode a bit-vector type B <sup>w</sup>. As a result, there might be some spare most-significant bits (MSBs) in an unsigned-integer variable. Normally, these bits have to be set to zeros (namely, the computation result is modulo 2 w) after each operation to guarantee the precision. Later in Sect. 4.5, we discuss the possibility of performing the modulo operation to results lazily only when needed, instead of applying it eagerly after each operator. Such laziness helps to generate shorter C programs and provides an opportunity for software analyzers to work more efficiently. In the evaluation, we will also compare the effects of these two translation schemes. Next, we follow the order of Table 1 in the Btor2 paper [64] to introduce the Btor2 operators and their implementations in C.

Indexed Operators. Unsigned- and signed-extension operators uext and sext can be implemented by type casting during the variable assignment. The bitslicing operator slice is implemented by first right-shifting the number of sliced least-significant bits and masking the spare MSBs to zeros.

<sup>3</sup> We stick to the ISO C18 standard [52]; GNU C offers an unsigned \_\_int128 type, but not every software analyzer supports it. Recently, there is a proposal to support arbitrary-width integers in ISO C23, which will further simplify the translation.

Unary Operators. The bitwise negation operator not is implemented by its counterpart ~ in C. The arithmetic operators inc, dec, and neg are implemented using the ++, −−, and − operators in C. The reduction operator redand (resp. redor) is implemented by comparing the operand to 2 <sup>w</sup> − 1 (resp. 0) for an operand of type B <sup>w</sup>. As there is no native support in C to compute the sum of all bits modulo 2 (parity) in an integer variable, the reduction operator redxor is implemented by repeatedly shifting and XOR-ing the variable with itself, such that the result will end up in the least-significant bit.

Binary Operators. For bit-vectors, the (in)equality operators eq, neq, gt, gte, lt, and lte are implemented by the corresponding C operators. For arrays, the equality operator is implemented by looping the two input arrays to find a different element. Bitwise operators and, or, and xor<sup>4</sup> and arithmetic operators add, mul, div, rem (remainder), and sub are all supported in C and can be directly implemented using the respective C operators. In the language Btor2, the result of division-by-zero is defined to be the maximum number of the operands' sort. Our translation takes this specification into account to generate equivalent C programs. Otherwise, division-by-zero would be considered as undefined behavior in C.

Shifting operators sll (logical left shift) and srl (logical right shift) are implemented by the left- and right-shifting operators in C, respectively. According to the ISO C18 standard [52], the result of right-shifting a negative value is implementation-defined. Therefore, to ensure the intended behavior of the arithmetic right-shift operator sra, we always pad ones directly to the resulting value if the given operand is negative (i.e., MSB equals 1). In this way, we do not have to assume any specific implementation of the software verifiers.

Concatenating and rotating operators concat, rol (rotating left), and ror (rotating right), are not natively supported in C. We implemented them by shifting and bitwise disjunction. For example, in order to concatenate node n<sup>1</sup> of type B <sup>3</sup> and node n<sup>2</sup> of type B 5 , we use var\_1 << 5 | var\_2, assuming var\_1 and var\_2 are of type unsigned char.

The read operator for array types, which takes an array and an index, is simply implemented by C's syntax to access an array.

Ternary Operators. The if-then-else operator ite works both for bit-vectors and arrays. It is implemented by the ternary operator exp1 ? exp2 : exp3 in C.

The write operator takes an array, an index for where to write, an element for what to write, and returns an updated array. It is implemented using the standard syntax in C to modify the content of an array.

Note that in a Btor2 file, a line with operator write essentially creates a new copy of the original array with one updated element. The original array is not replaced, because it might also be referred to by other lines. In principle, if no lines access the original array after a write operation, the operation could modify the element in place without allocating a new array. For now, Btor2C always copies a new array during a write operation for simplicity.

<sup>4</sup> The operators nand, nor, and xnor are implemented with the bitwise NOT operator.

#### 4.5 Applying Modulo Operations Lazily

Observe that there are some operators that can work correctly without precise operand values, which offers us the opportunity to apply modulo operations lazily and save some computations in translated programs. For instance, consider the addition operator. If a<sup>1</sup> ≡ a<sup>2</sup> (mod n) and b<sup>1</sup> ≡ b<sup>2</sup> (mod n), we conclude that a<sup>1</sup> + b<sup>1</sup> ≡ a<sup>2</sup> + b<sup>2</sup> (mod n) according to modular arithmetic. In other words, the addition operator does not need precise operands and works correctly for modular numbers (i.e., equivalence classes modulo n). By contrast, other operators might yield different results for modular numbers. For example, a + kn > b does not guarantee a > b when k > 0. Therefore, performing the modulo operation to the result of an operator is only necessary where the result is used in another operator that requires precise operand values.

Btor2C provides an option for the lazy application of modulo operations. If the option is turned on, Btor2C analyzes whether the precise value is required for each node by looking at the node's fan-outs. If any of its fan-outs needs the precise computation result of the node, the modulo operation will be applied to it. Otherwise, the modulo operation will be skipped, and the result could be a modular number of the precise value. Operators that require precise operand values mainly include inequalities as well as indices for reading and writing arrays. As an example, if we enable the lazy behavior to translate the Btor2 circuit in Fig. 2a, the modulo operations in line 13 and line 20 of the program in Fig. 2b can be omitted, because input\_5 and var\_6 are used only in addition and subtraction, which do not need precise operand values.

#### 4.6 Discussion

Correctness of the Translation. As will be seen in Sect. 5, the reliability of Btor2C is empirically validated over a large input set: Most software verifiers obtain consistent answers on the translated C programs as the hardware verifiers. For Btor2 models that violate the safety property, the violation witness generated by software verifiers can be transformed to that of the original Btor2 circuit as a certificate of the translation process. The Btor2Tools utility suite offers a simulator to check the transformed witness against the Btor2 model.

Limitations. The current version of Btor2C has no support yet for the translation of fairness constraints (keyword fair), liveness properties (keyword justice), and overfow detection (keywords addo, divo, mulo, and subo). In our evaluation, only supported keywords appear in the collected Btor2 circuits.

### 5 Evaluation

We evaluate the claims presented in Sect. 1.1 using the following research questions:


To answer the above research questions, we evaluated the state of the art of hardware and software analyzers over a large benchmark set consisting of more than thousand hardware-verification tasks.

#### 5.1 Benchmark Set

We collected hardware-verification tasks in both Btor2 and Verilog formats from various sources, including the benchmark suites used in the 2019 and 2020 Hardware Model Checking Competitions [29] and the explicit-state model-checking tasks derived from the BEEM project [65]. The whole benchmark set as well as a complete list of sources are available in the reproduction artifact [16] of this paper. We also contributed a set of verification tasks to the sv-benchmarks collection, the largest freely available benchmark set of the verification and testing community.

As the proposed translator Btor2C uses Btor2 as frontend, we translated tasks in Verilog to Btor2 with Yosys [70]. An aggregate of 1912 Btor2 tasks were collected. We excluded 414 tasks with bit-vectors longer than 64 bits, because Btor2C cannot translate these tasks into standard ISO C18 programs. Out of the remaining 1498 Btor2 tasks, 1341 use only bit-vector sorts, and the remaining 157 tasks manipulate both bit-vector and array sorts. The bit-vector category contains 473 unsafe tasks (with a known specification violation) and 868 safe tasks (for which the specification is satisfied). The array category contains 17 unsafe and 140 safe tasks.

We translated the remaining 1498 Btor2 tasks into C programs by the proposed tool Btor2C (tag tacas23-camera), assuming the LP64 data model. The 1341 tasks in the bit-vector category are also translated to Aiger by the translator Btor2AIGER, which is provided in the Btor2Tools utility suite. The original Btor2 models as well as the translated C programs and Aiger circuits are available in the reproduction package [16] and online<sup>5</sup> .

Unfortunately, Btor2AIGER does not translate Btor2 circuits with array sorts to Aiger. In our benchmark set, translating a Btor2 file to either a C program or an Aiger circuit took less than a second. Therefore, we ignore the translation time in the run-time of compared tools. An input task with the required format is directly given to each tool. To facilitate the comparison with v2c, we additionally gathered 22 C programs translated by v2c from its repository<sup>6</sup> .

#### 5.2 State-of-the-Art Hardware and Software Analysis

To adequately reflect the state of the art of hardware and software analysis, we evaluated the most competitive tools from the Hardware Model Checking Competitions and Competitions on Software Verification and Testing. A wide range of analysis techniques implemented in these tools were investigated in our experiment. Due to space limitation, Sect. 5.4 will show the best configuration of each tool on our benchmark set.

Hardware Model Checkers. For hardware analysis, we selected the state-of-theart bit-level model checker ABC [30] (commit a9237f5<sup>7</sup> ) and AVR [46] version 2.1,

<sup>5</sup> https://gitlab.com/sosy-lab/research/data/word-level-hwmc-benchmarks

<sup>6</sup> https://github.com/rajdeep87/verilog-c

<sup>7</sup> https://github.com/berkeley-abc/abc

a word-level hardware model checker that won HWMCC 2020. The former takes Aiger circuits as input, and the latter directly consumes Btor2 models. We evaluated the implementations of bounded model checking (BMC) [27] and property directed reachability (PDR) [41, 45] in both ABC and AVR. Interpolation-based model checking (IMC) [57] in ABC and k-induction (KI) [69] in AVR were also assessed.

Software Analyzers. For software verifiers, we enrolled the first, second, and fourth ranked verifiers VeriAbs [2], CPAchecker [20], and Esbmc [43] of category ReachSafety in SV-COMP 2022. The 3rd ranked verifier PeSCo [66] was omitted because it selects algorithms from the CPAchecker framework. All verifiers were downloaded from the archiving repository<sup>8</sup> of the competition. (For Esbmc, the performance of an earlier version in SV-COMP 2021 was better than the latest version on our benchmark set, so we used the older version instead.) We tried the implementations of loop abstraction (LA) [38] in VeriAbs; predicate abstraction (PA) [18, 50], Impact [24, 58], and IMC [21] in CPAchecker; BMC and KI [17, 18, 39, 44] in both CPAchecker and Esbmc.

For software testers, the overall winner FuSeBMC [3] of Test-Comp 2022, which implements fuzz testing (fuzzing), was picked. We also experimented with other testers from the competition, but they failed to generate test suites on our benchmark set. FuSeBMC was downloaded from the archiving repository<sup>9</sup> of the competition.

In the following discussion, we use htooli-halgorithmi to denote the implementation of a specific algorithm in a particular tool. For example, AVR-KI refers to the k-induction implementation in AVR.

#### 5.3 Experimental Setup

All experiments were conducted on machines running Ubuntu 22.04 (64 bit), each with a 3.4 GHz CPU (Intel Xeon E3-1230 v5) with 8 processing units and 33 GB of RAM. Each task was limited to 2 CPU cores, 15 min of CPU time, and 15 GB of RAM. We used BenchExec<sup>10</sup> [22] to ensure reliable resource measurement and reproducible results.

#### 5.4 Results

RQ1: Solving HW-Verification Tasks with SW Analyzers. To study the performance of software analyzers on hardware-verification tasks, we compared the selected software tools against the state-of-the-art hardware model checkers. The results are summarized in Table 1.

Note that some software verifiers are good at finding bugs in these tasks. VeriAbs found most correct alarms in the experiment, and Esbmc also detected more bugs than AVR. By contrast, hardware model checkers were better at computing correctness proofs. Even the best software configuration CPAchecker-PA for proving correctness only achieved fewer than a half of the proofs for

<sup>8</sup> https://gitlab.com/sosy-lab/sv-comp/archives-2022/-/tree/svcomp22

<sup>9</sup> https://gitlab.com/sosy-lab/test-comp/archives-2022/-/tree/testcomp22

<sup>10</sup> https://github.com/sosy-lab/benchexec

Table 1: Summary of the results for hardware and software verifiers (suffixes -e and -l stand for applying modulo operations eagerly or lazily, respectively)

Fig. 4: Quantile plots for all correct proofs and alarms of bit-vector tasks

bit-vector tasks. In the array category, AVR delivered 45 correct proofs, whereas the software verifiers cannot solve any of them. Our results may inspire tool developers to investigate and alleviate the performance difference. Since we have contributed a category ReachSafety-Hardware of verification tasks to the common benchmark collection, the 2023 competition results of SV-COMP include evaluations of all participating tools on those new tasks.

The quantile plots of correct proofs and alarms for bit-vector tasks are shown in Fig. 4a and Fig. 4b, respectively. A data point (x, y) in the plots indicates that there are x tasks correctly solvable by the respective tool within a CPU time of y seconds. In our experiments, ABC is the most efficient and effective tool in producing proofs, and VeriAbs is the best for bug hunting. While the number of alarms found by Esbmc is more than AVR and close to ABC, it spent more time in finding bugs in general.

In our evaluation, we observe that PDR is the most competitive algorithm for both hardware model checkers, whereas software verifiers show diverse strengths in different approaches. To account for the difference in algorithms, we also compare implementations of the same algorithm in various analyzers.

BMC is one of the most popular formal approaches to detect errors. It is implemented by most of the evaluated tools. Software testers are also able to

Fig. 5: Quantile plot comparing bug hunting (with BMC) on bit-vector tasks

Fig. 6: CPU time (left) and memory (right) consumption of AVR-KI and Esbmc-KI

hunt bugs, and hence we include FuSeBMC, a derivative of Esbmc that combines BMC and fuzzing, into the comparison. Figure 5 shows the quantile plot of correct alarms for unsafe bit-vector tasks. Note that the performance of BMC implementations in software verifiers are close to those in hardware verifiers. However, FuSeBMC performed not as well as other competitors, indicating that fuzzing might not be fruitful for our benchmark set.

We also performed a head-to-head comparison of the k-induction implementations in AVR and Esbmc over the bit-vector and array tasks. Both tools rely on SMT solving for formula reasoning, so the confounding variables are fewer than other combinations. Figure 6 shows the scatter plots for the CPU time and memory usage of AVR and Esbmc to produce correct results. A data point (x, y) in the plots indicates the existence of a task correctly solved by both tools, for which Esbmc took x units of the computing resource and AVR took y units. AVR was often more efficient than Esbmc, but the latter solved 13 tasks that the former cannot solve.

RQ2: Complementing HW Model Checkers with SW Analyzers. Overall, hardware model checkers performed better than software analyzers on our benchmark set, which is expected since they have been heavily optimized for hardware-verification tasks. However, comparing the results of the tools for Table 1, we observed 43 tasks that were uniquely solved by software verifiers. Interestingly, 39 of these uniquely solved tasks have a violated property. Combining BMC with loop unwinding heuristics, e.g., the technique implemented in VeriAbs [2], is helpful to find bugs in these tasks. This phenomenon demonstrates that software-


Table 2: Results for 22 programs generated by Btor2C and v2c

analysis techniques are able to complement hardware model checkers, which is facilitated by the proposed Btor2C translator. Some potential reasons affecting the effectiveness and efficiency of software analyzers will be discussed in Sect. 5.5.

RQ3: Optimization in Btor2C. Section 4.5 presented an optimization technique that performs modulo operations to intermediate results lazily, in order to generate shorter C programs. To assess whether this technique benefits the downstream software analysis, we compared the performance of the selected software verifiers, CPAchecker, Esbmc, and VeriAbs, on C programs translated by Btor2C with or without this optimization (namely, applying modulo operations lazily or eagerly, respectively).

The results of the best-performing algorithm for each tool in terms of the number of correct answers are summarized in Table 1, whose right panel also shows the results of the verifiers on these 2 sets of C programs. (CPAchecker-BMC actually solved more tasks than CPAchecker-PA, but it was mainly for bug hunting. Therefore, we reported the second best configuration, predicate abstraction, for CPAchecker.) If modulo operations are applied lazily instead of eagerly, the numbers of overall correct results are increased by roughly 2.2 % for both CPAchecker and Esbmc, and by 0.3 % for VeriAbs. Although VeriAbs found 4 fewer correct proofs if modulo operations are applied lazily, it reported 5 more correct alarms. Therefore, we conclude that generating shorter C programs by reducing modulo operations is an effective optimization in Btor2C. From now on, Btor2C enables this optimization by default.

RQ4: Comparison with v2c. Btor2C is a lightweight tool, whose compiled binary is smaller than 0.25 MB. By contrast, the precompiled v2c executable downloaded from its web archive<sup>11</sup> is 5.7 MB. While such difference is negligible given the capability of modern computers, we believe that a simple frontend language benefits tool implementation.

Besides implementation complexity, we also investigated the efficiency of the translation process. As mentioned in Sect. 5.1, Btor2C took less than a second to translate any Btor2 model in the benchmark set. Unfortunately, neither the v2c executable in the archive was runnable, nor was its source code compilable<sup>12</sup>. Therefore, we were not able to directly compare the translation efficiency of Btor2C and v2c.

<sup>11</sup> https://www.cs.ox.ac.uk/people/rajdeep.mukherjee/tacas16\_v2c.tar.gz

<sup>12</sup> https://github.com/rajdeep87/verilog-c/issues/6

As an alternative, we collected 22 C programs from v2c's benchmark repository and manually adapted them to the syntax rules used in SV-COMP. The original Verilog circuits of these C programs were translated to Btor2 by Yosys and further translated by Btor2C into another set of C programs. We compare the performance of the evaluated software verifiers on these two sets of 22 verification tasks in Table 2. Observe that the three verifiers produced more correct results on the C programs generated by Btor2C, showing the benefit of using Yosys +Btor2 as frontend in the translation flow.

#### 5.5 Discussion

From the experimental results shown above, we observe a notable performance difference between software and hardware analyzers. There are several possibilities to explain this outcome: First, the tasks were encoded in different formats for software and hardware analyzers. Btor2C encoded bit-vectors with unsigned integer types, which may contain some spare bits that complicate software analysis. Second, each analyzer uses a different backend logical solver. ABC encodes queries in propositional logic and uses SAT solving, while other tools resort to firstorder formulas and SMT solving. (In our experiments, AVR used Yices2 [40], CPAchecker used MathSAT5 [35] for predicate abstraction and Boolector3 [64] for BMC, and Esbmc used Boolector3.) The ability of solvers may affect the analyzers' performance. Third, the internal modeling used by the analyzers varies. Software verifiers typically represent a program as a control-flow graph, which might be unnecessarily complex when the problem at hand is merely a statetransition system. Despite the above reasons, software verifiers were able to solve 43 tasks that the considered hardware model checkers cannot solve.

### 6 Conclusion

Assuring the correctness of computational systems is challenging yet imperative. Therefore, we should embrace every opportunity to analyze our systems by removing the barriers between research communities. We implemented the lightweight and open-source tool Btor2C for translating sequential Btor2 circuits to C programs, to enable the application of off-the-shelf software analyzers to hardware designs. We conducted a large-scale experiment including more than thousand verification tasks. State-of-the-art bit-level and word-level model checkers as well as software verifiers and testers were evaluated empirically. Thanks to the simplicity of the Btor2 language, software analyzers performed decently on the translated programs and complemented the hardware model checkers by detecting more bugs and uniquely solving 43 tasks in our experiment. Our translator Btor2C demonstrates a new spectrum of analysis options to hardware developers and verification engineers. The translator also simplifies the construction of a new set of hardware analyzers, because any software analyzer can now be used to solve hardware-verification tasks, with Btor2C as preprocessing. In the future, we wish to bridge the gap from the other direction. That is, we aim at translating programs into circuits and apply hardware analyzers to solve software problems.

Data-Availability Statement. To enhance the verifiability and transparency of the results reported in this paper, all used software, verification tasks, and raw experimental results are available in a supplemental reproduction package [16]. A previous version [15] of the reproduction package was reviewed by the Artifact Evaluation Committee. The updated version [16] fixes issues found by reviewers of the paper and the artifact. For convenient browsing of the data, interactive result tables are also available at https://www.sosy-lab.org/research/btor2c/.

Funding Statement. This project was funded in part by the Deutsche Forschungsgemeinschaft (DFG) – 378803395 (ConVeY).

Acknowledgements. We thank the SV-COMP community and an anonymous reviewer for pointing out the division-by-zero issue.

### References


Open Access. This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4. 0/), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## CoPTIC: Constraint Programming Translated Into C

Martin Mariusz Lester()

University of Reading, Reading, United Kingdom m.lester@reading.ac.uk

Abstract. Constraint programming systems allow a diverse range of problems to be modelled and solved. Most systems require the user to learn a new constraint programming language, which presents a barrier to novice and casual users. To address this problem, we present the CoPTIC constraint programming system, which allows the user to write a model in the well-known programming language C, augmented with a simple API to support using a guess-and-check paradigm. The resulting model is at most as complex as an ordinary C program that uses naive brute force to solve the same problem.

CoPTIC uses the bounded model checker CBMC to translate the model into a SAT instance, which is solved using the SAT solver CaDiCaL. We show that, while this is less efficient than a direct translation from a dedicated constraint language into SAT, performance remains adequate for casual users. CoPTIC supports constraint satisfaction and optimisation problems, as well as enumeration of multiple solutions. After a solution has been found, CoPTIC allows the model to be run with the solution; this makes it easy to debug a model, or to print the solution in any desired format.

Keywords: constraint programming · bounded model checking · C programming language

### 1 Introduction

Constraint programming is a form of declarative programming. A constraint program or model typically declares some variables and asserts a certain relationship that must hold between them. A constraint solver automatically finds values of the variables that satisfy the constraints.

There is a broad body of research in constraint programming, which explores different kinds of constraints, different languages for expressing them, and different methods for solving them. If you know you are likely to become a frequent user of constraint programming, it is relatively easy to take advantage of this. After making the effort to learn a standardised constraint language, such as MiniZinc, you have easy access to a range of common constraints and solvers.

But what if you are a casual user, who encounters a single problem that is too complex or time-consuming to solve by hand, but might be easy with the assistance of a computer? You may be tempted to prototype a solution using a simple technique such as brute force or backtracking search. This may well work, but it is easy to make an error when writing such a program. Or the problem may turn out to be computationally harder than expected. Alternatively, you may try to learn a constraint programming language, but if the effort required is high and the process is error-prone, you may be deterred. Furthermore, if you do not need to use the language again for months or years, you may well have forgotten it by then, meaning that much of the effort is wasted.

To meet the needs of this kind of user, we introduce the CoPTIC (Constraint Programming Translated Into C) system for constraint programming. CoPTIC reduces the effort needed to write a model by allowing the user to write it in a declarative style as a C program. It achieves this by using the existing program verification tool CBMC, which in turn uses a SAT solver.

In outline, the C program must first declare all variables in the constraint problem and assign them a nondeterministic value. Next, it assumes that all of the constraints hold; paths where they do not hold should be ignored. Finally, it asserts false; that is, it is an error if the program reaches its end.

We can pass the program to CBMC and ask it to verify that the assertion cannot be violated. CBMC tries to find a resolution of the nondeterminism that leads to an assertion violation; it does this by encoding the problem as a SAT instance and solving it with a SAT solver. It reports back a counterexample trace to the verification problem. By construction, the values of the variables in this trace satisfy the constraints.

This idea is fairly straightforward for someone familiar with CBMC to apply in an ad-hoc way to a particular problem. However, a usable constraint programming system needs more than this. The contributions of this paper are the implementation, description and experimental evaluation of the CoPTIC system, which automates and extends the process outlined above.

We illustrate how to write constraint models in the guess-and-check paradigm outlined above with examples and explain how CoPTIC solves these models using CBMC in Section 2. We show how CoPTIC makes it easy for a user to:


In particular, CoPTIC reads resolved nondeterministic choices from CBMC's counterexample trace and constructs a C function that replays them when the program is compiled and run with an ordinary compiler. A similar construction has been used by Beyer and others to produce tests from verification witnesses [3], but CoPTIC uses it to display the solution to the constraint model.

We discuss debugging constraint models, efficiency of the SAT encoding and some other practical considerations in Section 3. Next, we evaluate CoPTIC empirically on problems from CSPLib in Section 4, considering both solver performance and the size of the models. The software artifact accompanying this paper [15] contains the source code for CoPTIC, which is released under the MIT License, as well as the models and scripts needed to reproduce our experiments. In Section 5 we discuss related work in constraint programming and automated verification, before concluding in Section 6.

### 2 The Guess-and-Check Paradigm

CoPTIC constraint models are C programs that mix the language's conventional imperative style with a declarative guess-and-check paradigm. To illustrate how the system is used and how it works, we now consider some worked examples. First, we will see that the code in the CoPTIC models is similar to a naive attempt to solve the problems using brute force or backtracking search (but often faster in execution). We argue that this makes the system easy to learn and to use for programmers with little knowledge or experience of constraint programming. Then we will see how to extend the approach to solution enumeration and optimisation.

Many finite-domain constraint problems are in the complexity class NP. NP problems can be characterised as those that:


CoPTIC exploits this equivalence constructively. Given a guess-and-check program that verifies a certificate, we can view CoPTIC as compiling the program into SAT with CBMC, which executes the nondeterministic program with a SAT solver. CoPTIC extracts the certificate, hard-codes it into the program to make it deterministic, then compiles it with a normal compiler and executes it deterministically.

#### 2.1 Constraint Satisfaction: Magic Square

Let us consider the well-known problem of finding a normal 3 × 3 magic square. A normal n × n magic square is an n × n grid of integers from 1 to n 2 , where every row, every column and both diagonals have the same sum.

Suppose we try to solve this problem using brute force. We write the simple program shown on the left side of Figure 1, which iterates through all possible assignments of integers to each grid cell. The program checks each assignment to see if it meets all the required constraints. As soon as one does, it prints it out and terminates.

We are pleased to see that, after a few minutes, the program finds a solution. Next we try with a larger square. We will be dismayed, as the running time of the program increases drastically.

How could we solve this problem more efficiently? The right side of Figure 1 shows the program adapted for use with CoPTIC. The program begins by

```
# define N 3
# define MAX ( N * N)
# define TARGET (((( N* N )+1)* N )/2)
# include < stdio .h >
int main () {
  int grid [N ][ N ];
  for ( int x = 0; x < N; x ++) {
    for ( int y = 0; y < N; y ++) {
      gr id[x] [y] = 1;
    }
  }
  int ok;
  do { // Try all cell values .
    gri d[0] [0]+ +;
    int x = 0;
    int y = 0;
    whi le ( grid [x][ y] > MAX ) {
      gr id[x] [y] = 1;
      if (++x == N) {
         x = 0;
         y++ ;
      }
      gr id[x] [y]+ +;
    } // Until we find a
    ok = 1 ; // valid magic square .
    // Check cells all different .
    for ( int x = 0; x < N; x ++) {
    for ( int y = 0; y < N; y ++) {
    for ( int x2 = 0; x2 < N; x2 ++) {
    for ( int y2 = 0; y2 < N; y2 ++) {
      ok &= (( x == x2 ) && (y == y2 )) ||
         ( grid [x ][ y ]!= grid [ x2 ][ y2 ]);
    } } } }
    // Check column sums correct .
    for ( int x = 0; x < N; x ++) {
       int sum = 0;
       for ( int y = 0; y < N; y ++) {
         sum += grid [ x ][ y ];
      }
      ok &= ( sum == TARGET );
    }
    // 3 similar checks omitted .
  } w hile (!o k);
  // Print out the solution .
  for ( int y = 0; y < N; y ++) {
    for ( int x = 0; x < N; x ++) {
       printf ("% d ", grid [x ][ y ]);
    }
    printf ("\ n ");
  }
}
                                         # define N 3
                                         # define MAX (N * N)
                                         # define TARGET (((( N*N )+1)* N )/2)
                                         # include " coptic .h "
                                         int main () {
                                            int grid [N ][ N ];
                                            for ( int x = 0; x < N; x ++) {
                                              for ( int y = 0; y < N; y ++) {
                                                gri d[x] [y] = GU ESS_ INT( );
                                                CHE CK(g rid[ x][y ] > 0 &&
                                                  gr id[x ][y] <= M AX);
                                              }
                                            }
                                            // No need to search for the right
                                            // cell values explicitly . CBMC 's
                                            // embedded SAT solver will find
                                            // them for us .
                                            // When using CBMC , we will roughly
                                            // set the following macros :
                                            // GUESS_INT () -> nondet_int ()
                                            // CHECK ( X) -> assume (X)
                                            // SATISFY () -> assert (0)
                                            // OUTPUT (X) -> { }
                                            // Check cells all different .
                                            for ( int x = 0; x < N; x ++) {
                                            for ( int y = 0; y < N; y ++) {
                                            for ( int x2 = 0; x2 < N; x2 ++) {
                                            for ( int y2 = 0; y2 < N; y2 ++) {
                                              CHE CK ((( x == x2 ) && (y == y2 ))
                                                || ( grid [ x ][ y ]!= grid [ x2 ][ y2 ]));
                                            } } } }
                                            // Check column sums correct .
                                            for ( int x = 0; x < N; x ++) {
                                              int sum = 0;
                                              for ( int y = 0; y < N; y ++) {
                                                sum += grid [ x ][ y ];
                                              }
                                              CHE CK( sum == TARGET );
                                            }
                                            // 3 similar checks omitted .
                                            SA TISF Y();
                                            OU TPUT(
                                              for ( int y = 0; y < N; y ++) {
                                                for ( int x = 0; x < N; x ++) {
                                                   printf ("% d ", grid [x ][ y ]);
                                                }
                                                printf ("\ n ");
                                              }
                                            )
                                         }
```
Fig. 1. Left: A brute force program to find a magic square. Right: A CoPTIC model to solve the same problem. Note the absence of code for explicit search.

Fig. 2. The architecture of the CoPTIC system. Solid arrows indicate data flow. Dashed arrows indicate inclusion of a C header file.

including the coptic.h header file. Now, instead of iterating through each possible assignment explicitly, the program GUESSes the values of the grid cells. The checks are much the same as before, but use CoPTIC's CHECK macro. The call to SATISFY indicates that we want to find any solution that satisfies all the constraints, while the code in the OUTPUT block is run only when a solution is found. We run the modified program with CoPTIC and are once again pleased as it finds a solution in a few seconds.

#### 2.2 CoPTIC Architecture

Now let us consider how CoPTIC produces the solution. Figure 2 shows the architecture of the system. After using the C compiler gcc to syntax-check and type-check the program (not shown), it runs the bounded model-checker CBMC on the program, asking it to verify absence of assertion violations. CBMC transforms the problem of finding an assertion violation in the program into a giant SAT instance and attempts to solve it using a SAT solver.

The header file coptic.h supplies definitions of GUESS, CHECK, SATISFY and OUTPUT that behave as follows: GUESS tells CBMC to pick a value nondeterministically and log it. CHECK takes a condition and tells CBMC to ignore program paths where the condition is false. SATISFY violates a trivial assertion; this tells CBMC to report failed verification and an accompanying program trace if there is a program path that reaches the assertion. OUTPUT takes a block and ignores it.

If the SAT instance is unsatisfiable, the solver reports this to CBMC. Then CBMC reports to CoPTIC that program verification was successful, as no assertion violation could be found. CoPTIC in turn reports that the constraints in the model were unsatisfiable.

Conversely, if the SAT instance is satisfiable, the solver reports a satisfying assignment to CBMC. CBMC converts this into a trace of steps of execution through the program that lead to the assertion violation. It reports to CoPTIC that program verification was unsuccessful and logs the trace that led to the assertion violation. Now CoPTIC can report that the constraints in the model are satisfiable, but it still has to show how.

To do this, it reads the nondeterministically GUESSed values from the log and writes a C header file containing a stateful replay function that, on each successive call, returns these values in the same order. It compiles the model with gcc, but uses a preprocessor macro to set a flag that changes the behaviour of coptic.h. Now GUESS calls the replay function, CHECK becomes a run-time assertion, SATISFY does nothing and OUTPUT executes the supplied block.

Finally, CoPTIC runs the compiled model. The replay function provides the variable values that satisfy the constraints in the model, the run-time assertions pass and the OUTPUT code prints the solution. Because the OUTPUT code can be arbitrary C code, it is easy to format the solution and display it in any reasonable format.

Many constraint models represent not just a single problem, but a family of similar instances. For example, instances for our magic square model might involve completing partially filled magic squares of different sizes. In this case, CoPTIC allows instance data to be imported from an external source, such as a JSON or CSV file. To achieve this, the user needs to specify a filter program that translates the instance data into definitions in a C header file; coptic.h will then include this header file. The filter can be written in any language and the CoPTIC distribution includes some examples.

#### 2.3 Planning: Knight's Tour

In the magic square example, the CoPTIC model began by guessing all the values in the square and the rest of the program was deterministic. However, this need not be the case, and we can often express a model more naturally or succinctly by mixing declarative and imperative programming. This is particularly useful for planning problems.

To demonstrate the flexibility of this approach, let us consider another wellknown problem: finding a knight's tour on a chessboard. An open knight's tour is a sequence of moves made by a knight on a chessboard that visits each square exactly once. The top of Figure 3 shows a simple program to find a knight's tour on a 5 × 5 board using a recursive implementation of a backtracking search. Most of the implementation's complexity comes from using recursion to manage backtracking and from enumerating all the possible moves of a knight from a particular square.

The bottom of Figure 3 shows how we can remove this complexity in a CoPTIC model. Instead of using recursion and backtracking, we now use a simple loop that nondeterministically guesses the next move at each step. Instead of enumerating possible moves explicitly, we guess a position where the x-ordinate differs by 2 and the y-ordinate differs by 1 or vice versa.

```
# define M 5
# define N 5
# include < stdio .h >
int board [M ][ N] = {0};
int search ( int x , int y , int d );
int search ( int x , int y , int d ) {
    if (x < 0 || x >= M || y < 0 || y >= N || board [x ][ y ]) {
         return 0; // Check the square is on the board and unvisited .
    }
    d - -; // Stop when all squares visited .
    if (d == 0) {
         printf ("(% d ,% d )\ n" , x , y );
         return 1;
    }
    board [x ][ y] = 1; // Don 't visit this square again .
    if ( search (x -2 , y -1 , d) || search (x +2 , y -1 , d) || // Try all valid
         search (x -2 , y +1 , d) || search (x +2 , y +1 , d) || // knight 's moves
         search (x -1 , y -2 , d) || search (x +1 , y -2 , d) || // in sequence .
         search (x -1 , y +2 , d) || search (x +1 , y +2 , d )) {
         printf ("(% d ,% d )\ n" , x , y ); // Unwind recursion on success ,
         return 1; // printing moves in reverse .
    }
    board [x ][ y] = 0; // Backtrack on failure .
    return 0;
}
int main () {
    search (0 , 0, (M*N )); // Start the search , beginning in a corner .
}
# define M 5
# define N 5
# include " coptic .h "
int main () {
    int board [M ][ N] = {0};
    int x0 = 0; // Begin in a corner .
    int y0 = 0;
    printf ("(0 ,0)\ n ");
    for ( int d = 1; d < M* N; d ++) { // Find a sequence of M*N moves .
         int x = GUESS_INT (); // Pick the next move .
         int y = GUESS_INT ();
         // Check the square is on the board and unvisited .
         CHECK (!( x < 0 || x >= M || y < 0 || y >= N || board [x ][ y ]));
         CHECK (( abs (x - x0 ) == 2 && abs (y - y0 ) == 1) || // Check it ' s a valid
                ( abs (x - x0 ) == 1 && abs (y - y0 ) == 2)); // knight 's move .
         board [x ][ y] = 1; // Don 't visit this square again .
         OUTPUT (
             printf ("(% d ,% d )\ n" , x , y ); // Print the move .
         )
         x0 = x; // The square we picked becomes the new position .
         y0 = y;
    }
    SATISFY ();
}
```
Fig. 3. Top: A backtracking program to find an open Knight's Tour (with moves listed in reverse order). Bottom: A CoPTIC model to solve the same problem (with moves listed in order).

```
# include " coptic .h "
int main () {
    int x = GUESS_INT ();
    // (x -2)( x -5) = x ^2 - 7 x + 10
    CHECK (( x *x) - (7* x) + 10 == 0);
    DECLARE (x );
    ENUMERATE ();
    OUTPUT ( printf ("% d\ n", x );)
}
// Rough definitions when using CBMC :
// DECLARE (X) -> log (X ); trie (X )
// ENUMERATE () -> assert ( _trie == leaf );
                                              // Trie generated for this model :
                                              void trie ( int x) {
                                                switch ( _trie ) {
                                                   case 0:
                                                     switch (x) {
                                                       case 5: _trie = 1; break ;
                                                       case 2: _trie = 2; break ;
                                                       default : _trie = -1; break ;
                                                     }
                                                     break ;
                                                   default : _trie = -1; break ;
                                                }
                                              }
```
Fig. 4. A CoPTIC model to enumerate integer solutions to a quadratic equation.

Knight's Tour can be solved efficiently using a program implementing backtracking search with the additional heuristic of preferring the move that leaves fewest options for the following move. Our CoPTIC model cannot compete with this in speed of execution (or with a custom encoding in SAT [21]), but it has the advantages that it is shorter and does not require specialist knowledge of the problem, so is significantly easier to implement.

#### 2.4 Enumeration: Integer Quadratics

Next we turn our attention to constraint problems that require not only satisfying a set of constraints, but also finding an optimal solution (as measured by some objective function) or enumerating all solutions. Both of these involve making multiple calls to CBMC.

For solution enumeration, we consider the example of finding integer solutions to an equation. Figure 4 shows a CoPTIC model to find all integer solutions to a quadratic equation. This model introduces ENUMERATE, which instructs CoPTIC to enumerate all solutions.

This is not as straightforward as it might first seem. While CBMC generates a SAT instance and some SAT solvers support an option that enumerates all solutions to an instance, this would not help much here, as a model may guess and check auxiliary values that do not contribute to the solution, and these need not be unique. So we need a way for a model to indicate which values are significant, in the sense that a difference in one of these values is sufficient to make a solution distinct; this is what DECLARE does.

We also need a way, within the C program, to assume that one of these values is different. In this case, we could use a single assumption to check x is not equal to the solution already found. But in general, a solution may comprise multiple values and we cannot simply check all of them at once, as they might not all be in scope simultaneously. (Consider the Knight's Tour model, where the variable holding the current position is overwritten on each iteration of the loop.) The solution CoPTIC adopts is to construct a trie of DECLAREd values for each solution, then within the model to trace progress through the trie as the program

```
# define ORDER 4
# define BETTER (A ,B ) (A < B)
# include " coptic .h "
int main () {
    int a[ ORDER ];
    a [0] = 0;
    for ( int n = 1; n < ORDER ; n ++) {
        a[n ] = GUESS_INT ();
        CHECK (a[ n] > a[n -1]);
        // DECLARE (a[ n ]);
        OUTPUT ( printf ("% d ", a[n ]);)
    }
    OUTPUT ( printf ("\ n ");)
    for ( int i1 = 0; i1 < ORDER ; i1 ++) {
        for ( int j1 = i1 +1; j1 < ORDER ; j1 ++) {
             for ( int i2 = 0; i2 < ORDER ; i2 ++) {
                 for ( int j2 = i2 +1; j2 < ORDER ; j2 ++) {
                      CHECK ((( i1 == i2 )&&( j1 == j2 )) || ( a[ j1 ]-a[ i1 ]!= a[ j2 ]-a [ i2 ]));
                 }
             } // Rough definition when using CBMC on 1 st run :
        } // OPTIMIZE (X ) -> log (X ); assert (0)
    } // Rough definition when using CBMC on 2 nd run :
                            // OPTIMIZE (X ) -> log (X ); assert (! BETTER (X , BEST ))
    OPTIMIZE (a[ ORDER -1]); // where BEST is best objective found so far .
}
```
Fig. 5. A CoPTIC model for finding an optimal Golomb ruler.

executes. Finally, ENUMERATE asserts that the current trie node is terminal; if it is not, then the solution is novel. This approach is not very efficient, as each use of DECLARE (after any loops have been unrolled) leads to another copy of the trie's "next node" function in CBMC's SAT encoding. But it does work even when there are multiple paths through a program and when the number of DECLAREd values varies between solutions. For situations where the number of values is constant and they are all available at a single point in the program, CoPTIC supports a form of DECLARE with multiple arguments.

One usually considers the problem of finding solutions to polynomial equations in the context of real numbers, not integers. So one might wonder whether CoPTIC supports GUESSing values of types other than int. Indeed it does: all primitive C types are supported. However, while (in contrast to many other constraint solvers) floating point types are supported, CBMC's implementation depends on an encoding in SAT, which does not perform very well.

#### 2.5 Optimisation: Golomb Rulers

Finally, to illustrate optimisation, we consider the Golomb ruler problem of finding a sequence of n increasing integers, starting from 0, such that the differences between all pairs taken from the sequence are unique. For a given n, an optimal Golomb ruler minimises the last number in the sequence. For n = 4, the only optimal solution is 0, 1, 4, 6.

Figure 5 shows a CoPTIC model for finding an optimal Golomb ruler. The model guesses a sequence of n integers and checks that the sequence is increasing, and that all differences between pairs are unique. (Ignore the commented line for the moment.) Instead of calling SATISFY, this model calls OPTIMIZE with the last element of the sequence, which is our objective that we wish to minimise.

When CoPTIC passes this model to CBMC, it uses an implementation of OPTIMIZE that does two things. Firstly, it logs the objective, so that CoPTIC can read it afterwards. Secondly, if it has already found a feasible value of the objective, it asserts that the objective is not BETTER than that previously found. CoPTIC calls CBMC repeatedly until it is unable to find a better objective, at which point, the best found so far must be optimal.

By allowing BETTER to be defined as part of the model, CoPTIC supports not only maximisation and minimisation of numerical objectives, but also more complex objectives, such as lexicographic minimisation of a pair of values.

Returning to the problem of finding an optimal Golomb Ruler, for n = 7, there are multiple solutions. We can use CoPTIC to find them all by uncommenting the DECLARE line and replacing OPTIMIZE with ENUMERATE\_OPTIMAL. CoPTIC treats ENUMERATE\_OPTIMAL the same as OPTIMIZE until it has found an optimal solution, after which it behaves as ENUMERATE with the extra restriction that solutions must be optimal.

### 3 Practical Considerations

Now that we have seen how the guess-and-check paradigm is used for modelling and how it is implemented by CoPTIC for constraint satisfaction, optimisation and enumeration, we turn our attention to some practical details of usability and performance.

### 3.1 Debugging Constraint Models

In program verification, a common concern is not only whether a program meets its specification, but also whether the specification is correct. In constraint programming, a similar concern applies. It is easy to under-specify a model, resulting in solutions to the model that are not solutions to the intended problem. In this case, a useful approach is to add extra logging to the model as OUTPUT. It is also easy to over-specify a model, resulting in a model with no solutions, even though the intended problem has solutions. This is harder to diagnose, but one helpful method is to comment out CHECKs until the model has a solution.

Another important concern in verification is whether the verification tool has accurately modelled the behaviour of the program being verified. Similarly, in constraint programming, we may worry whether the solution found by a solver really does satisfy the constraints. CoPTIC addresses this by turning CHECKs into assertions when running the model with nondeterminism resolved. On the occasions when the compiled program does violate one of these assertions, we have usually found that it results from an erroneous out-of-bounds array access in the model, which is undefined behaviour. A particular problem that results from CBMC's bit-level modelling of two's complement integer arithmetic is that CoPTIC may find solutions to a model that involve very large integers that overflow when added together, leading to an erroneous negative objective value. This is usually easy to avoid by CHECKing an upper bound on GUESSed integers in the model. It may also improve performance, especially for optimisation problems, where it may reduce the number of calls to CBMC.

CoPTIC keeps all files it produces during solution in a temporary directory. This includes log files from CBMC, header files for replaying nondeterminism, and output from the compiled programs (of which there may be several in the case of optimisation or enumeration). In the event of any problems, this makes it easy for a user to examine exactly what has happened.

One occasional problem is that CBMC is unable to translate the model into a SAT instance. General program verification is undecidable, so there are necessarily limits to the kinds of programs CBMC can handle. For example, it may be unable to infer a bound on the number of executions of a loop. In this case, CoPTIC will hang and CBMC's log file will show the loop in question being unrolled repeatedly, so the cause will be clear. However, we recommend that it is best to avoid this problem in the first place by using simple for loops with obvious statically computable bounds wherever possible. We also suggest that, while use of arrays, functions and structs is fine, unbounded recursion, heap memory allocation and pointer arithmetic should be avoided. CBMC should always be able to handle programs satisfying these restrictions.

#### 3.2 Performance

CoPTIC's target audience is casual users of constraint programming. Therefore performance need not be outstanding, but it should still be acceptable. In constraint programming, performance often depends more on modelling decisions than on the efficiency of the solver, so an important factor in this regard is that different ways of modelling a problem should be easily expressible. We argue that CoPTIC's ability to mix imperative with declarative programming helps here.

Clearly there will be some overhead introduced by CBMC's translation into SAT, when compared with a translation from a dedicated constraint programming language directly into SAT. An obvious example might be use of fixed bit-width integers in the C program that are larger than necessary for the range of values taken by a variable in the model. But if these wasted high bits do not materially participate in any constraints, they will rarely lead to a conflict during SAT solving, so the SAT solver may be able to ignore them much of the time.

CBMC aims for bit-precise verification of C programs running on conventional microprocessors, so it uses a two's complement encoding for integers. This is acceptable, but Zhou and Kjellerstrand found that a sign-magnitude encoding worked better when developing PicatSAT [23]. Furthermore, for many problems where variables range over small domains, a one-hot encoding works better than a binary encoding.

### 4 Evaluation on CSPLib Problems

We claim that CoPTIC is easy to write models in and that its performance is adequate for many problems. To evaluate these claims empirically, we developed and benchmarked CoPTIC models for problems from CSPLib [12].

CSPLib is "a library of test problems for constraint solvers" expressed in natural language. The problems are drawn from a variety of domains, including operations research, combinatorial mathematics and puzzle games. Most problems include sample models written in constraint programming languages, such as MiniZinc or Essence. Some problems consist of a single instance; some consist of several similar instances. Some problems are constraint satisfaction problems; some are optimisation problems. CSPLib now contains 95 problems and has served as a focus for research in constraint programming over the past two decades [13]. For our evaluation, we restrict our attention to the 14 problems in the original 1999 release. This gives us a reasonable sample of the different kinds of problem, although there are no solution enumeration problems; see the artifact for some examples of enumeration [15].

For each CSPLib problem, we wrote a CoPTIC model. Where present in CSPLib, we also selected a MiniZinc model and an Essence model for the same instance. Where a problem included several instances, we picked one we considered to be representative. Mostly, we chose the example given in the problem specification, but in some cases these were very easy, so we chose harder instances to make the differences in performance clearer. For problem 6, we chose the largest instance listed as having multiple solutions. For problem 10, we used the hardest instance solved using SAT by Triska and Musliu [19]. For problems 12 and 13, we picked the hardest instances in CSPLib.

To benchmark performance, we ran our models using CoPTIC and recorded time taken to solve them. We measured times with two different builds of CBMC 5.57.0: one using MiniSat 2.2.1 as the solver (the standard configuration) and the other using CaDiCaL 1.4.1 (a supported compile-time option). For comparison, we also ran the MiniZinc models and the Essence models using SAT-based solvers. Note that, while these models encode the same problem, they may do so with quite different formalisations, which can have a big impact on solution time. This is fine for our purposes, as in evaluating the whole CoPTIC system, the ease with which we can write good models is at least as important as the speed of solution.

To run the MiniZinc models, we used MiniZinc 2.6.3 to convert them into FlatZinc, then PicatSAT in Picat 3.3#3 to solve them. PicatSAT uses the SAT solver Kissat 1.0.3. PicatSAT won 2nd place in the Free track of the MiniZinc Challenge 2022; Kissat won the Main track of the SAT Competition 2020. We also benchmarked a version of PicatSAT patched to use CaDiCaL 1.4.1.

To run the Essence models, we used Conjure 2.3.0 to compile to EssencePrime, then SavileRow 1.9.1 to solve using CaDiCaL 1.4.1 as the SAT solver (instead of the shipped solver CaDiCaL 1.3.0).

Table 1 shows our results. All benchmarks were run on a Debian Linux 10 machine with a 3.4 GHz Intel Core i5-7500 CPU and 64 GB of RAM, using a time limit of 1 hour. It is clear that dedicated constraint modelling languages


Table 1. Solution times for different CSPLib problem instances with different models and solvers. All values are times rounded to the nearest second. The time limit was 1 hour of CPU time. Times are from a single run; problems 4 and 10 showed some variation on repetition.

and solvers generally perform better than CoPTIC, as one would expect. But the majority of problems are still solvable within a reasonable amount of time. Therefore this is not a problem for our intended user, who would normally be happy to trade an increase in solution time for a decrease in time and effort needed to learn how to write a model. In fact, comparing directly with just the Essence models or just the MiniZinc models, we see that the CoPTIC models led to more solutions within our time limit, although this is somewhat dependent on our choice of time limit and hardness of problem instances.

Using CBMC built with CaDiCaL rather than MiniSat slows down some models, but mostly results in more consistent performance. CaDiCaL is much better at proving unsatisfiability, which makes a big difference for the optimisation problems (2, 5 and 6), where unsatisfiability demonstrates optimality.

During our benchmarking, we discovered that there were some errors in the Essence models in CSPLib. The model for problem 2 (template design) omits the limit on the total number of designs in a template, so the solution it gives is infeasible. We fixed the model by adding the missing constraint. The model for problem 8 (vessel loading) has a subtle error resulting from the semantics of evaluating a function outside its defined domain, so it can never be solved. We fixed the model by changing a guard in an implication. We also found that the EssencePrime solver SavileRow ran out of memory very quickly on some problems; we suspect this is a bug in the translation to SAT.

It is difficult to evaluate ease of writing models quantitatively, although perhaps this could be done through a controlled trial with undergraduate students. But what we can do is measure the size of the models we produced in terms


Table 2. Number of lines of code and resulting SAT instance sizes (thousands of variables/clauses) for modelling different CSPLib problems in different languages. Blank lines, comments, input data and formatting are excluded from SLoC totals.

of source lines of code (SLoC). While there are many criticisms of SLoC, it is widely used as a metric to estimate the amount of effort needed to develop a program. Table 2 shows the size of our CoPTIC models, compared with the MiniZinc and Essence models. As is conventional, we do not count blank lines or comments. We have also chosen not to count lines used for any input data or for formatting output. For input data, this is because the formats are very similar, but conventions on line breaks may differ between them, so it is not meaningful to compare them. For formatting, Essence does not appear to support custom formatting in the models, so including formatting code would inflate the line counts for CoPTIC and MiniZinc. Furthermore, for some problems, the output format may differ significantly between the CoPTIC and MiniZinc models. For example, output for a problem involving laying out rectangles in a grid could consist of co-ordinates of the rectangles or a rendering in ASCII art.

Again, it is clear that models written in the dedicated modelling languages tend to be smaller, as one would expect, However, the CoPTIC models are of similar size to and occasionally smaller than the MiniZinc models. The Essence models are particularly succinct because they include more complex, higherlevel modelling constructs. For example, in the model for the Progressive Party problem, one of the constraints is encoded in the Essence model using universal quantification, function preimage and function composition, while the CoPTIC model expresses the same constraint using a for loop and nested array lookup. From the perspective of a casual user, while the latter is more verbose, it may be easier to write and comprehend.

Table 2 also shows the number of variables and clauses in the SAT instances generated from each model. While this is a poor metric of the difficulty of a SAT instance, it is useful here in demonstrating the extra overhead introduced by using CoPTIC, compared with a dedicated modelling language and encoding.

#### 5 Related Work

The key underlying technology in CoPTIC is the bounded model checker CBMC [7], which in turn relies on the SAT solvers MiniSat and CaDiCaL. In typical operation, CBMC aims to verify the universally quantified property that, for all paths of execution of a C program, there is no assertion violation. It does this by using a SAT solver to solve the existential problem of finding a path containing an assertion violation. If the SAT solver finds a path, CBMC reports failed verification with the path as a counterexample; if not, CBMC reports successful verification. In CoPTIC, we typically use CBMC to solve the existential problem of finding values of variables that satisfy constraints.

In the field of automated verification, bounded model checkers have been successful because of their ability to verify (or find bugs in) large programs with bit-level accuracy and minimal user annotation. Other successful bounded model checkers include SMACK [18], which uses the LLVM toolchain with Boogie as the solver, and ESBMC [8], which uses SMT solvers rather than a SAT solver.

Most modern SAT solvers use a variant of Conflict-Driven Clause Learning (CDCL). MiniSat [9] won the SAT Race 2006. Because of its good performance and publicly available, easily editable source code, it became the default choice for developers of applications that needed a SAT solver. The more modern solver CaDiCaL [4] won several tracks in the 2017 and 2018 competitions and has since also become a popular choice. The recent editions of the SAT Competition have been dominated by Kissat, Biere's rewrite of CaDiCaL in C.

Constraint programming encompasses a wide range of modelling languages and solution techniques. Because the ability of a technique to handle a problem efficiently depends significantly on how the problem is expressed, modelling of constraint problems, including the choice of modelling language, remains a big concern. Significant milestones in modelling include the release of CSPLib in 1999 [12] and the MiniZinc modelling language in 2007 [17]. Whilst MiniZinc is the most broadly supported language and has a long-running associated competition, there are many others, including Essence [11] (which supports higher-level types, such as functions), Picat [24] (which adopts a logic programming paradigm) and XCSP3 [1] (which aims to be a kind of intermediate language).

There are several constraint programming toolkits such as Gecode [6] that provide an API through which a constraint solver can be invoked from within a C program. However, these either require that the constraints be written in a separate modelling language, or that the model be built through a sequence of API calls that resembles a transliteration of a constraint program written in the solver's native language. The system closest to ours is CoJava [5], which adopts a similar guess-and-check paradigm in Java; there is a custom translation into

MiniZinc [10]. As it does not use an existing, well-tested verification tool, there may be concerns about the correctness of its translation.

The main techniques implemented in general-purpose constraint solvers are backtracking search and local search, both of which can be improved by good choice of heuristics and constraint propagation. However, in recent years, translation into SAT has become a leading technique for solving constraint problems. PicatSAT [22] won the main tracks in the XCSP3 Competition 2019 and 2022, and has ranked highly in every MiniZinc Challenge since 2016.

The idea of solving a constraint problem by translating it into C and using a C program verification tool, such as CBMC, is not new, but CoPTIC automates part of this process. Verma and Yap translated XCSP3 problems into C programs [20] and used them to benchmark symbolic execution tools such as KLEE. Lester used a similar translation as the basis for Exchequer [2], which won the Mini Solver track in the XCSP3 Competition 2022. Lester has also shown how to solve the planning problem of completing an interactive fiction game by applying CBMC to a modified version of the source code [14]. Meanwhile, in the SAT Competition 2022, Manthey submitted a set of benchmarks based around using CBMC to solve the puzzle Summle [16].

#### 6 Conclusion

We have presented the CoPTIC system for constraint programming, which allows a user to write constraint models in C and solve them by translation to SAT using the bounded model checker CBMC. Our system is freely available online and easy to install, with only standard dependencies. CoPTIC supports not only constraint satisfaction problems, but also optimisation and enumeration.

These features make CoPTIC an attractive system for casual users of constraint programming. In time, it may serve as a gateway language for some to learn dedicated constraint programming languages. As well as being a useful system in its own right, CoPTIC showcases the power of automated verification tools and SAT solvers, which have advanced massively in the last two decades.

In many cases, a CoPTIC model for solving a problem will perform better than a C program that uses brute force or heuristic search. Even when it does not, we should recall that in the world of programming, it is received wisdom that "premature optimisation is the root of all evil", as it wastes development effort and increases the risk of introducing bugs. Thus the CoPTIC approach is still preferable, as it reduces development effort.

This argument also applies at the meta level. For occasional users of constraint programming, it is better to write constraint programs in a language one already knows than to expend time and effort learning a dedicated constraint programming language, even if the dedicated language ultimately allows one to write more succinct models and supports more efficient solvers. For regular users of constraint programming, the dedicated language is a clear winner, but for casual users, CoPTIC achieves an acceptable balance of ease of learning, ease of use and performance.

### Data Availability Statement

The source code and constraint models that support the findings of this study are available in Zenodo: https://doi.org/10.5281/zenodo.7313351 [15]. The constraint models were derived from CSPLib: https://www.csplib.org/.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Acacia-Bonsai: A Modern Implementation of Downset-Based LTL Realizability

Micha¨el Cadilhac1() and Guillermo A. P´erez<sup>2</sup>

<sup>1</sup> DePaul University, Chicago, USA michael@cadilhac.name <sup>2</sup> University of Antwerp – Flanders Make, Antwerp, Belgium guillermo.perez@uantwerp.be

Abstract. We describe our implementation of downset-manipulating algorithms used to solve the realizability problem for linear temporal logic (LTL). These algorithms were introduced by Filiot et al. in the 2010s and implemented in the tools Acacia and Acacia+ in C and Python. We identify degrees of freedom in the original algorithms and provide a complete rewriting of Acacia in C++20 articulated around genericity and leveraging modern techniques for better performance. These techniques include compile-time specialization of the algorithms, the use of SIMD registers to store vectors, and several preprocessing steps, some relying on efficient Binary Decision Diagram (BDD) libraries. We also explore different data structures to store downsets. The resulting tool is competitive against comparable modern tools.

Keywords: LTL synthesis · C++ · downset · antichains · SIMD · BDD

#### 1 Introduction

Nowadays, hardware and software systems are everywhere around us. One way to ensure their correct functioning is to automatically synthesize them from a formal specification. This has two advantages over alternatives such as testing and model checking: the design part of the program-development process can be completely bypassed and the synthesized program is correct by construction.

In this work we are interested in synthesizing reactive systems [17]. These maintain a continuous interaction with their environment. Examples of reactive systems include communication, network, and multimedia protocols as well as operating systems. For the specification, we consider linear temporal logic (LTL) [27]. LTL allows to naturally specify time dependence among events that make up the formal specification of a system. The popularity of LTL as a formal specification language extends to, amongst others, AI [15,8,16], hybrid systems and control [6], software engineering [21], and bio-informatics [1].

The classical doubly-exponential-time synthesis algorithm can be decomposed into three steps: 1. compile the LTL formula into an automaton of exponential size [32], 2. determinize the automaton [29,26] incurring a second exponential blowup, and 3. determine the winner of a two-player zero-sum game

c The Author(s) 2023

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 192–207, 2023. https://doi.org/10.1007/978-3-031-30820-8 14

played on the latter automaton [28]. Most alternative approaches focus on avoiding the determinization step of the algorithm. This has motivated the development of so-called Safra-less approaches, e.g., [20,11,10,31]. Worth mentioning are the on-the-fly game construction implemented in the Strix tool [24] and the downset-based (or "antichain-based") on-the-fly bounded determinization described in [13] and implemented in Acacia+ [5]. Both avoid constructing the doubly-exponential deterministic automaton. Acacia+ was not ranked in recent editions of SYNTCOMP [18] (see http://www.syntcomp.org/) since it is no longer maintained despite remaining one of the main references for new advancements in the field (see, e.g., [12,33,30,22,2]).

Contribution. We present the Acacia approach to solving the problem at hand and propose a new implementation that allows for a variety of optimization steps. For now, we have focused on (B¨uchi automata) realizability, i.e., the decision problem which takes as input an automaton compiled from the LTL formula and asks whether a controller satisfying it exists. In our tool, we compile the input LTL formula into an automaton using Spot [9]. We entirely specialize our presentation on the technical problem at hand and strive to distillate the algorithmic essence of the Acacia approach in that context. The main algorithm is presented in Section 3.4 and the different implementation options are listed in Section 4. Benchmarks are included in Section 6.

All benchmarks were executed on the revision of the software that can be found at: https://github.com/gaperez64/acacia-bonsai/tree/SYNTCOMP22.

### 2 Preliminaries

Throughout this paper, we assume the existence of two alphabets, I and O; although these stand for input and output, the actual definitions of these two terms is slightly more complex: An input (resp. output) is a boolean combination of symbols of I (resp. O) and it is pure if it is a conjunction in which all the symbols in I (resp. O) appear exactly once; e.g., with I = {i1, i2}, the expressions > (true), ⊥ (false), and (i<sup>1</sup> ∨ i2) are inputs, and (i<sup>1</sup> ∧ ¬i2) is a pure input. Similarly, an IO is a boolean combination of symbols of I ∪ O, and it is pure if it is a conjunction in which all the symbols in I ∪ O appear exactly once. We use i, j to denote inputs and x, y for IOs. Two IOs x and y are compatible if x ∧ y 6= ⊥.

A B¨uchi automaton A is a tuple (Q, q0, δ, B) with Q a set of states, q<sup>0</sup> the initial state, δ the transition relation that uses IOs as labels, and B ⊆ Q the set of B¨uchi states. The actual semantics of this automaton will not be relevant to our exposition, we simply note that these automata are usually defined to recognize infinite sequences of pure IOs. We assume, throughout this paper, the existence of some automaton A.

We will be interested in valuations of the states of A that encode the number of visits to B¨uchi states—again, we do not go into details here. We will simply speak of vectors over A for elements in Z <sup>Q</sup>, mapping states to integers. We will write ~v for such vectors, and v<sup>q</sup> for its value for state q. In practice, these vectors will range into a finite subset of Z, with −1 as an implicit minimum value (meaning that (−1) − 1 is still −1) and an upper bound provided by the problem.

For a vector ~v over A and an IO x, we define a function that takes one step back in the automaton, decreasing components that have seen B¨uchi states. Write χB(q) for the function mapping a state q to 1 if q ∈ B, and 0 otherwise. We then define bwd(~v, x) as the vector over A that maps each state p ∈ Q to:

$$\min\_{\substack{(p,y,q)\in\delta\\x\text{ compatible with }y}} \quad \left(v\_q - \chi\_B(q)\right)\_+,$$

and we generalize this to sets: bwd(S, x) = {bwd(~v, x) | ~v ∈ S}. For a set S of vectors over A and a (possibly nonpure) input i, define:

$$\text{CPre}\_i(S) = S \cap \bigcup\_{\substack{x \text{ pure IO} \\ x \text{ compatible with } i}} \text{bwd}(S, x) \text{ .} $$

It can be proved that iterating CPre with any possible pure input stabilizes to a fixed point that is independent from the order in which the inputs are selected. We define CPre<sup>∗</sup> (S) to be that set.

All the sets that we manipulate will be downsets: we say that a vector ~u dominates another vector ~v if for all q ∈ Q, u<sup>q</sup> ≥ vq, and we say that a set is a downset if ~u ∈ S and ~u dominates ~v implies that ~v ∈ S. This allows to implement these sets by keeping only dominating elements, which form, as they are pairwise nondominating, an antichain. In practice, it may be interesting to keep more elements than just the dominating ones or even to keep all of the elements to avoid the cost of computing domination.

Finally, we define Safe<sup>k</sup> as the downset {i | i ≤ k} <sup>Q</sup>, i.e., all vectors with values bounded by k. We are now equipped to define the computational problem we focus on:

#### BackwardRealizability


We note, for completeness, that (for sufficiently large values of k) this problem is equivalent to deciding the realizability problem associated with A: the question has a positive answer if and only if the output player wins the Gale-Stewart game with payoff set the complement of the language of A.

#### 3 Realizability algorithm

The problem admits a natural algorithmic solution: start with the initial set, pick an input i, apply CPre<sup>i</sup> on the set, and iterate until all inputs induce no change to the set, then check whether this set contains a vector that maps q<sup>0</sup> to 0. We first introduce some degrees of freedom in this approach, then present a slight twist on that solution that will serve as a canvas for the different optimizations.

#### 3.1 Boolean states

This opportunity for optimization was identified in [4] and implemented in Acacia+, we simply introduce it in a more general setting and succinctly present the original idea when we mention how it can be implemented in Section 4.2. We start with an example. Consider the B¨uchi automaton from Figure 1 with q0, q<sup>1</sup> 6∈ B.

Fig. 1. Small automaton with q0, q<sup>1</sup> 6∈ B.

Recall that we are interested in whether the initial state can carry a nonnegative value, after CPre has stabilized. In that sense, the crucial information associated with q<sup>0</sup> is boolean in nature: is its value positive or −1? Even further, this same remark can be applied to q<sup>1</sup> since q<sup>1</sup> being valued 6 or 7 is not important to the valuation of q0. Hence the set of states may be partitioned into integer-valued states and boolean-valued ones. Naturally, detecting which states can be made boolean comes at a cost and not doing it is a valid option.

#### 3.2 Actions

For each IO x, we will have to compute bwd(~v, x) oftentimes. This requires to refer to the underlying B¨uchi automaton and checking for each transition therein whether x is compatible with the condition. It may be preferable to precompute, for each x, what are the relevant pairs (p, q) for which x can go from p to q. We call the set of such pairs the io-action of x and denote it io-act(x); in symbols:

io-act(x) = {(p, q) | (∃(p, y, q) ∈ δ)[x is compatible with y]} .

Further, as we will be computing CPrei(S) for inputs i, we abstract in a similar way the information required for this computation. We use the term input-action for the set of io-actions of IOs compatible with i and denote it i-act(i); in symbols:

$$\text{i-act}(i) = \bigcup\_{\substack{x \text{ an IO} \\ \text{composite with } i}} \text{i-act}(x) \text{ .} $$

In other words, actions contain exactly the information necessary to compute CPre. Note that from an implementation point of view, we do not require that the actions be precomputed. Indeed, when iterating through pairs (p, q) ∈ io-act(x), the underlying implementation can choose to go back to the automaton.

### 3.3 Sufficient inputs

As we consider the transitions of the B¨uchi automaton as being labeled by boolean expressions, it becomes more apparent that some pure IOs can be redundant. For instance, consider a B¨uchi automaton with I = {i}, O = {o1, o2}, but the only transitions compatible with i are labeled (i∧o1) and (i∧¬o1). Pure IOs compatible with the first label will be (i ∧ o<sup>1</sup> ∧ o2) and (i ∧ o<sup>1</sup> ∧ ¬o2), but certainly, these two IOs have the same io-actions, and optimally, we would only consider (i ∧ o1). However, we should not consider (i ∧ o2), as it induces an ioaction that is not induced by a pure IO. We will thus allow our main algorithm to select certain inputs and IOs and introduce the following notion:

Definition 1. An IO (resp. input) is valid if there exists any pure IO (resp. input) with the same io-action (resp. input-action). A set X of valid IOs is sufficient if it represents all the possible io-actions of pure IOs: {io-act(x) | x ∈ X} = {io-act(x) | x is a pure IO}. A sufficient set of inputs is defined similarly with input-actions.

### 3.4 Algorithm

We solve BackwardRealizability by computing CPre<sup>∗</sup> explicitly:

Our algorithm requires that the "input-action picker" used in line 8 decides whether we have reached a fixed point. As the picker could check whether S has changed, this is without loss of generality.

The computation of CPre<sup>a</sup> is the intuitive one, optimizations therein coming from the internal representation of actions. That is, it is implemented by iterating through all io-actions compatible with a, applying bwd on S for each of them, taking the union over all these applications, and finally intersecting the result with S.

### 4 The many options at every line

The main computational costs of the algorithm are in finding input-actions and computing CPrea. For the former, reducing the number of candidates is crucial (by considering a good set of sufficient inputs). For the latter, reducing the size of the automaton (hence the dimension of the vectors) and providing efficient data types for downsets is key. Additionally, for the "input-action picker" to return an input that will make progress, it has to explore S in some way — this can again be a costly operation that would be sped up by better data structures for downsets. Let us now review these potential optimizations line by line.

#### 4.1 Preprocessing of the automaton (line 1)

In this step, one can provide a heuristic that removes certain states that do not contribute to the computation. We provide an optional step that detects surely losing states, as presented in [14].

#### 4.2 Boolean states (line 2)

We provide an implementation of the detection of boolean states, in addition to an option to not detect them. Our implementation is based on the concept of bounded state, as presented in [4]. A state is bounded if it cannot be reached from a B¨uchi state that lies in a nontrivial strongly connected component. This can be detected in several ways, although it is not an intrinsically costly operation.

#### 4.3 Vectors and downsets (line 3)

The most basic data structure in the main algorithm is that of a vector used to give a value to the states. We provide a handful of different vector classes:


Additionally, all these implementations can be glued to an array of booleans (std::bitset) to provide a type that combines boolean and integer values. These types can optionally expose an integer that is compatible with the partial order (here, the sum of all the elements in the vector: if ~u dominates ~v, then the sum of the elements in ~u is larger than that of ~v). This value can help the downset implementations in sorting the vectors.

Downset types are built on top of a vector type. We provide:

<sup>3</sup> SIMD: Single Instruction Multiple Data, a set of CPU instructions & registers to compute component-wise operations on fixed-size vectors.


#### 4.4 Selecting sufficient inputs (line 5)

Recall our discussion on sufficient inputs of Section 3.3. We introduce the notion of terminal IO following the intuition that there is no restriction of the IO that would lead to a more specific action:

Definition 2. An IO x is said to be terminal if for every compatible IO y, we have io-act(x) ⊆ io-act(y). An input i is said to be terminal if for every compatible input j we have i-act(i) ⊆ i-act(j).

Our approaches to input selection focus on efficiently searching for a sufficient set of terminal IOs and inputs. The key property of terminal inputs is that they are automatically valid, while still being more general than pure inputs.

Proposition 1. Any pure IO and any input is terminal. Any terminal IO and any terminal input is valid.

Proof. Any pure IO is terminal. Consider a pure IO x and a compatible IO y. If (p, q) ∈ io-act(x), then there is a transition (p, z, q) ∈ δ such that x is compatible with z, and thus x ∧ z = x. Consequently, x ∧ z ∧ y = x ∧ y 6= ⊥, hence y and z are compatible and (p, q) ∈ io-act(y). This shows that io-act(x) ⊆ io-act(y) and that x is terminal.

Any pure input is terminal. Consider now a pure input i and a compatible input j. Let io-act(x) ∈ i-act(i). It holds that x is compatible with i, hence i ∧ x 6= ⊥. Since i is pure, i ∧ j = i, thus i ∧ j ∧ x 6= ⊥, and x is also compatible with j, implying that io-act(x) ∈ i-act(j). This shows that i-act(i) ⊆ i-act(j) and that i is terminal.

Any terminal IO and input is valid. We prove the case for inputs, the IO case being similar. Let i be a terminal input and j be a compatible pure input (at least one exists), then i-act(i) ⊆ i-act(j). Since j is pure, it is also terminal, hence i-act(j) ⊆ i-act(i). Hence i-act(i) = i-act(j) and i is valid. ut

We present a simple algorithm for computing a sufficient set of terminal IOs. This is done by iteratively refining a set P of terminal IOs, starting by assuming that {>} is such a set and using any counterexample to split the IOs:

```
Algorithm 2 Computing a sufficient set of terminal IOs
```
Input: A B¨uchi automaton A Output: A sufficient set of terminal IOs P ← {>} for every label x in the automaton do for every element y in P do if x ∧ y 6= ⊥ then Delete y from P Insert x ∧ y in P if ¬x ∧ y 6= ⊥ then insert ¬x ∧ y in P return P

We provide 3 implementations of input selection:


#### 4.5 Precomputing actions (line 6)

Since computing CPre<sup>i</sup> for an input i requires to go through i-act(i), possibly going back to the automaton and iterating through all transitions, it may be beneficial to precompute this set. We provide this step as an optional optimization that is intertwined with the computation of a sufficient set of IOs; for instance, rather than iterating through labels in Algorithm 2, one could iterate through all transitions, and store the set of transitions that are compatible with each terminal IO on the fly.

#### 4.6 Main loop: Picking input-actions (line 8)

We provide several implementations of the input-action picker:


#### 4.7 When are we done?

The main algorithm answers either "yes, the formula is realizable" or "don't know." Indeed, for the value of k to provide an exact value, it has to be very large and reaching a fixed point in the computation becomes impossible in practice. However, it is not necessary to restart the whole algorithm with larger values of k in order to converge towards the correct answer: one can just increase all the components of all the vectors in S (our main set), and go back to the main loop. There are thus two parameters that can be adjusted: the starting value of k and the increment to S each time the loop is restarted.

### 5 Checking unrealizability of LTL specifications

As mentioned in the preliminaries, for large values of k the BackwardRealizability problem is equivalent to a non-zero sum game whose payoff set is the complement of the language of the given automaton. More precisely, for small values of k, a negative answer for the BackwardRealizability problem does not imply that the output player does not win the game. Instead, if one is interested in whether the output player wins, a property known as determinacy [23] can be leveraged to instead ask whether a complementary property holds: does the input player win the game?

We thus need to build an automaton B for which a positive answer to the BackwardRealizability translates to the previous property. To do so, we can consider the negation of the input formula, ¬φ, and inverse the roles of the players, that is, swap the inputs and outputs. However, to make sure the semantics of the game is preserved, we also need to have the input player play first, and the output player react to the input player's move. To do so, we simply need to have the outputs moved one step forward (in the future, in the LTL sense). This can be done directly on the input formula, by putting an X (neXt) operator on each output. This can however make the formula much more complex.

We propose an alternative to this: Obtain the automaton for ¬φ, then push the outputs one state forward. This means that a transition (p,hi, oi, q) is translated to a transition (p, i, q), and the output o should be fired from q. In practice, we would need to remember that output, and this would require the construction to consider every state (q, o), augmenting the number of states tremendously. Algorithm 3 for this task, however, tries to minimize the number of states (q, o) necessary by considering nonpure outputs that maximally correspond to a pure input compatible with the original transition label.


while V is nonempty do Pop (p, o) from V for every (p, x, q) ∈ δ do y ← x while y 6= ⊥ do // Iterating through x's minterms focusing on inputs Let i be a pure input compatible with y o <sup>0</sup> ← ∃I. x ∧ i // Extract nonpure output compatible with i Add (hp, oi, o ∧ i,hq, o<sup>0</sup> i) to ∆ If (q, o<sup>0</sup> ) is not in S, add it to S and V y ← y ∧ ¬i return S, ∆

#### 6 Benchmarks

#### 6.1 Protocol

For the past few years, the yardstick of performance for synthesis tools is the SYNTCOMP competition [19]. The organizers provide a bank of nearly a thousand LTL formulas, and candidate tools are run with a time limit of one hour on each of them. The tool that solves the most instances in this timeframe wins the competition.

To benchmark our tool, we relied on the 930 LTL formulas that were used in the 2021 SYNTCOMP competition, of which about 60% are realizable. Notably, 864 of all the tests were solved in less than 20 seconds by some tool during the competition, and among the 66 tests left out, 50 were not solved by any tool. This showcases a usual trend of synthesis tools: either they solve an instance fast, or they are unlikely to solve it at all. To better focus on the fine performance differences between the tools, we set a timeout of 60 seconds for all tests.

We compared Acacia-Bonsai against itself using different choices of options, and against Acacia+ [5], Strix [24], and ltlsynt [9,25]. The benchmarks were completed on a Linux computer with the following specifications:


We present some of these results in the form of survival plots (also called cactus plots). They indicate how many instances can be solved within a set time, where the time limit is for each instance. As a rule of thumb, the lower the curve, the better. Since the tool tend to solve a lot of instances under one second, we elected to present these graphics with a logarithmic y-axis.

#### 6.2 Results

The options of Acacia-Bonsai. We compared 25 different configurations of Acacia-Bonsai, in order to single out the best combination of options. We elected to

start with some sensible defaults and test each parameter by diverging from the defaults by a single option each time.


Fig. 2. Reducing unrealizability to realizability. Timeout set at 20 seconds.

Despite the automaton-based approach showing better overall results, we note that this approach provides a larger automaton than the formula-based approach in about 99.5% of the tests. Additionally, the automaton-based approach offers better performances even when looking at the running time without the formula-to-automaton part of the process. This seems to indicate that the automaton that is produced is somewhat simpler for the main algorithm.

Acacia-Bonsai and foes. The following plot shows the performance of the tools together. Within our parameters, Acacia-Bonsai solves 699 tests, while Acacia+ solves 560, ltlsynt 703, and Strix 770.

Fig. 3. Survival plot for SYNTCOMP tools and Acacia-Bonsai

Instances solved by one tool but not the other. To better understand the intrinsic algorithmic competitiveness of the different tools, we study which instances were solved by our tool but not the others, and conversely:


### 7 Conclusion

We provided multiple degrees of freedom in the main algorithm for downsetbased LTL realizability and implemented options for each of these degrees. In this paper, we presented the main ideas behind these. Experiments show that this careful reimplementation surpasses the performance of the original Acacia+, making Acacia-Bonsai competitive against modern LTL realizability tools. Along with implementing some optimizations present in previous implementations, we introduced several new ones: reduction of the input-output alphabet, alternative antichain data structures, different strategies for input-picking, and constructing a "shifted automaton" to test unrealizability.

A somewhat disappointing conclusion of our experiments concerns code that makes explicit use of SIMD registers, i.e., large CPU registers that support pointwise vector operations. Our experiments indicate that downset-based algorithms and downset data structures are not able to take full advantage of SIMD. In the future, we plan on investigating data structures for downsets that delay some of their computations in order to better leverage vectorized operations. Such a data structure would not provide better theoretical performances, but would potentially outperform our other data structures.

One surprise that prompts for further investigation is brought by our approach to unrealizability (Section 5): we provided two options for processing the input LTL formula into an automaton that expresses a realizable game iff the original formula was unrealizable. Although one option consistently produces larger automata than the other, it appears that the downset-based realizability algorithm performs better on the larger automata. A close study of the resulting automata may help in identifying salient features of automata that are easier for the Acacia algorithm.

Lastly, we should note that this reimplementation of Acacia+ is not complete, since a few options of Acacia+ have not yet been included in Acacia-Bonsai yet. One such option consists in decomposing LTL formulas that are conjunctions of subformulas into smaller instances of the realizability problem. We plan on implementing this before the next edition of SYNTCOMP.

Acknowledgements. We would like to thank V´eronique Bruy`ere for recommending the use of k-d trees as a data structure to store and manipulate downsets as well as Cl´ement Tamines for useful conversations on these and alternative data structures. This research was partially funded by the FWO G030020N project "SAILor".

Data-Availability Statement The software presented in this article and the analysed dataset are available as [7]. In addition, the version under study is tagged in the GitHub repository of this software as:

https://github.com/gaperez64/acacia-bonsai/tree/TACAS23

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Synthesis**

## Computing Adequately Permissive Assumptions for Synthesis ?

Ashwani Anand<sup>1</sup> , Kaushik Mallik<sup>2</sup> , Satya Prakash Nayak1() , and Anne-Kathrin Schmuck<sup>1</sup>

<sup>1</sup> Max Planck Institute for Software Systems, Kaiserslautern, Germany {ashwani,sanayak,akschmuck}@mpi-sws.org

2 Institute of Science and Technology Austria, Klosterneuburg, Austria kaushik.mallik@ist.ac.at

Abstract. We automatically compute a new class of environment assumptions in two-player turn-based finite graph games which characterize an "adequate cooperation" needed from the environment to allow the system player to win. Given an ω-regular winning condition Φ for the system player, we compute an ω-regular assumption Ψ for the environment player, such that (i) every environment strategy compliant with Ψ allows the system to fulfill Φ (sufficiency), (ii) Ψ can be fulfilled by the environment for every strategy of the system (implementability), and (iii) Ψ does not prevent any cooperative strategy choice (permissiveness). For parity games, which are canonical representations of ω-regular games, we present a polynomial-time algorithm for the symbolic computation of adequately permissive assumptions and show that our algorithm runs faster and produces better assumptions than existing approaches—both theoretically and empirically. To the best of our knowledge, for ω-regular games, we provide the first algorithm to compute sufficient and implementable environment assumptions that are also permissive.

Keywords: Synthesis · Two-player Games · Parity · Permissiveness.

#### 1 Introduction

Two-player ω-regular games on finite graphs are the core algorithmic components in many important problems of computer science and cyber-physical system design. Examples include the synthesis of programs which react to environment inputs, modal µ-calculus model checking, correct-by-design controller synthesis for cyber-physical systems, and supervisory control of autonomous systems.

These problems can be ultimately reduced to an abstract two-player game between an environment player and a system player, respectively capturing the external unpredictable influences and the system under design, while the game captures the non-trivial interplay between these two parts. A solution of the

© The Author(s) 2023

<sup>?</sup> S. P. Nayak and A.-K. Schmuck are supported by the DFG project 389792660 TRR 248-CPEC. A. Anand and A.-K. Schmuck are supported by the DFG project SCHM 3541/1-1. K. Mallik is supported by the ERC project ERC-2020-AdG 101020093.

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 211–228, 2023. https://doi.org/10.1007/978-3-031-30820-8\_15

game is a set of decisions the system player needs to make to satisfy a given ω-regular temporal property over the states of the game, which is then used to design the sought system or its controller.

Traditionally, two-player games over graphs are solved in a zero-sum fashion, i.e., assuming that the environment will behave arbitrarily and possibly adversarially. Although this approach results in robust system designs, it usually makes the environment too powerful to allow an implementation for the system to exist. However in reality, many of the outlined application areas actually account for some cooperation of system components, especially if they are co-designed. In this scenario it is useful to understand how the environment (i.e., other processes) needs to cooperate to allow for an implementation to exist. This can be formalized by environment assumptions, which are ω-regular temporal properties that restrict the moves of the environment player in a synthesis game. Such assumptions can then be used as additional specifications in other components' synthesis problems to enforce the necessary cooperation (possibly in addition to other local requirements) or can be used to verify existing implementations.

For the reasons outlined above, the automatic computation of assumptions has received significant attention in the reactive synthesis community. It has been used in two-player games [8,6], both in the context of monolithic system design [11,19] as well as distributed system design [18,13].

All these works emphasize two desired properties of assumptions. They should be (i) sufficient, i.e., enable the system to win if the environment obeys its assumption and (ii) implementable, i.e., prevent the system from falsifying the assumption to vacuously win the game by not even respecting the original specification. In this paper, we claim that there is an important third property permissiveness, i.e. the assumption retains all cooperatively winning plays in the game. This notion is crucial in the setting of distributed synthesis, as here assumptions are generated before the implementation of every component is fixed. Therefore, assumptions need to retain all feasible ways of cooperation to allow for a distributed implementation to be discovered in a decentralized manner.

While the class of assumptions considered in this paper is motivated by their use for distributed synthesis, this paper focuses only on their formalization and computation, i.e., given a two-player game over a finite graph and an ωregular winning condition Φ for the system player, we automatically compute an adequately permissive ω-regular assumption Ψ for the environment player that formalizes the above intuition by being (i) sufficient, (ii) implementable, and (iii) permissive. The main observation that we exploit is that such adequately permissive assumptions (APA for short) can be constructed from three simple templates which can be directly extracted from a cooperative synthesis game leading to a polynomial-time algorithm for their computation. By observing page constrains, we postpone the very interesting but largely orthogonal problem of contract-based distributed synthesis using APAs to future work.

To appreciate the simplicity of the assumption templates we use, consider the game graphs depicted in Fig. 1 where the system and the environment player control the circle and square vertices, respectively. Given the specification Φ =

Fig. 1: Game graphs with environment (squares) and system (circles) vertices.

♦{p} (which requires the play to eventually only see vertex p), the system player can win the game in Fig. 1 (a) by requiring the environment to fully disable edge e1. This introduces the first template type—a safety template—on e1. On the other hand, the game in Fig. 1 (b) only requires that e<sup>1</sup> is taken finitely often. This is captured by our second template type—a co-liveness template—on e1. Finally, consider the game in Fig. 1 (c) with the specification Φ = ♦{p}, i.e. vertex p should be seen infinitely often. Here, the system player wins if whenever the source vertices of edges e<sup>1</sup> and e<sup>2</sup> are seen infinitely often, also one of these edges is taken infinitely often. This is captured by our third template type—a live group template—on the edge-group {e1, e2}.

Contribution. The main contribution of this paper is to show that APAs can always be composed from the three outlined assumption templates and can be computed in polynomial time.

Using a set of benchmark examples taken from SYNTCOMP [1] and a prototype implementation of our algorithm in our new tool SImPA, we empirically show that our algorithm is both faster and produces more desirable solutions than existing approaches. In addition, we apply SImPA to the well known 2 client arbiter synthesis benchmark from [21], which is known to only allow for an implementation of the arbiter if the clients' moves are suitably restricted. We show that applying SImPA to the unconstrained arbiter synthesis problem yields assumptions on the clients which are less restrictive but conceptually similar to the ones typically used in the literature.

Related Work. The problem of automatically computing environment assumptions for synthesis was already addressed by Chatterjee et al. [8]. However, their class of assumptions does in general not allow to construct permissive assumptions. Further, computing their assumptions is an NP-hard problem, while our algorithm computes APAs in O(n 4 )-time for a parity game with n vertices. The difference in the complexity arises because Chatterjee et al. require minimality of the assumptions. On the other hand, we trade minimality for permissiveness which allows us to utilize cooperative games, which are easier to solve.

When considering cooperative solutions of non-zerosum games, related works either fix strategies for both players [7,14], assume a particularly rational behavior of the environment [4] or restrict themselves to safety assumptions [18]. In contrast, we do not make any assumption on how the environment chooses its strategy. Finally, in the context of specification-repair in zerosum games multiple automated methods for repairing environment models exist, e.g., [22,15,16,20,8]. Unfortunately, all of these methods fail to provide permissiveness. A recent work by Cavezza et al. [6] computes a minimally restrictive set of assumptions but only

for GR(1) specifications, which are a strict subclass of the problem considered in our work. To the best of our knowledge, we propose the first fully automated algorithm for computing permissive assumptions for general ω-regular games.

### 2 Preliminaries

Notation. We use N to denote the set of natural numbers including zero. Given two natural numbers a, b ∈ N with a < b, we use [a; b] to denote the set {n ∈ N | a ≤ n ≤ b}. For any given set [a; b], we write i ∈even [a; b] and i ∈odd [a; b] as short hand for i ∈ [a; b] ∩ {0, 2, 4, . . .} and i ∈ [a; b] ∩ {1, 3, 5, . . .} respectively. Given two sets A and B, a relation R ⊆ A × B, and an element a ∈ A, we write R(a) to denote the set {b ∈ B | (a, b) ∈ R}.

Languages. Let Σ be a finite alphabet. The notations Σ<sup>∗</sup> and Σ<sup>ω</sup> denote the set of finite and infinite words over Σ, respectively, and Σ<sup>∞</sup> is equal to Σ<sup>∗</sup> ∪Σ<sup>ω</sup>. For any word w ∈ Σ<sup>∞</sup>, w<sup>i</sup> denotes the i-th symbol in w. Given two words u ∈ Σ<sup>∗</sup> and v ∈ Σ<sup>∞</sup>, the concatenation of u and v is written as the word uv.

Game graphs. A game graph is a tuple G = (V, E) where (V, E) is a finite directed graph with vertices V and edges E, and V = V <sup>0</sup> ] V <sup>1</sup> be a partition of V . Without loss of generality, we assume that for every v ∈ V there exists v <sup>0</sup> ∈ V s.t. (v, v<sup>0</sup> ) ∈ E. For the purpose of this paper, the system and the environment players will be denoted by Player 0 and Player 1, respectively. A play is a finite or infinite sequence of vertices ρ = v0v<sup>1</sup> . . . ∈ V <sup>∞</sup>. A play prefix p = v0v<sup>1</sup> · · · v<sup>k</sup> is a finite play.

Winning conditions. Given a game graph G, we consider winning conditions specified using a formula Φ in linear temporal logic (LTL) over the vertex set V , that is, we consider LTL formulas whose atomic propositions are sets of vertices V . In this case the set of desired infinite plays is given by the semantics of Φ over G, which is an ω-regular language L(Φ) ⊆ V <sup>ω</sup>. Every game graph with an arbitrary ω-regular set of desired infinite plays can be reduced to a game graph (possibly with an extended set of vertices) with an LTL winning condition, as above. The standard definitions of ω-regular languages and LTL are omitted for brevity and can be found in standard textbooks [3].

Games and strategies. A two-player (turn-based) game is a pair G = (G, Φ) where G is a game graph and Φ is a winning condition over G. A strategy of Player i, i ∈ {0, 1}, is a partial function π i : V ∗V <sup>i</sup> → V such that for every pv ∈ V ∗V i for which π is defined, it holds that π i (pv) ∈ E(v). Given a strategy π i , we say that the play ρ = v0v<sup>1</sup> . . . is compliant with π i if vk−<sup>1</sup> ∈ V i implies v<sup>k</sup> = π i (v<sup>0</sup> . . . vk−1) for all k ∈ dom(ρ). We refer to a play compliant with π <sup>i</sup> and a play compliant with both π <sup>0</sup> and π <sup>1</sup> as a π i -play and a π 0π 1 -play, respectively. We collect all plays compliant with π i , and compliant with both π <sup>0</sup> and π 1 in the sets L(π i ) and L(π 0π 1 ), respectively.

Winning. Given a game G = (G, Φ), a strategy π i is (surely) winning for Player i if L(π i ) ⊆ L(Φ), i.e., a Player 0 strategy π 0 is winning if for every Player 1 strategy π 1 it holds that L(π 0π 1 ) ⊆ L(Φ). Similarly, a fixed strategy profile (π 0 , π<sup>1</sup> ) is cooperatively winning if L(π 0π 1 ) ⊆ L(Φ). We say that a vertex v ∈ V is winning for Player i (resp. cooperatively winning) if there exists a winning strategy π i (resp. a cooperatively winning strategy profile (π 0 , π<sup>1</sup> )) s.t. π i (v) is defined. We collect all winning vertices of Player i in the Player i winning region hhiiiΦ ⊆ V and all cooperatively winning vertices in the cooperative winning region hh0, 1iiΦ. We note that hhiiiΦ ⊆ hh0, 1iiΦ for both i ∈ {0, 1}.

### 3 Adequately Permissive Assumptions for Synthesis

Given a two-player game G, the goal of this paper is to compute assumptions on Player 1 (i.e., the environment), such that both players cooperate just enough to fulfill Φ while retaining all possible cooperative strategy choices. Towards a formalization of this intuition, we define winning under assumptions.

Definition 1. Let G = (G = (V, E), Φ) be a game and Ψ be an LTL formula over V . Then a Player 0 strategy π 0 is winning in G under assumption Ψ, if for every Player 1 strategy π 1 s.t. L(π 1 ) ⊆ L(Ψ) it holds that L(π 0π 1 ) ⊆ L(Φ). We denote by hh0ii<sup>Ψ</sup> Φ the set of vertices from which such a Player 0 strategy exists.

We remark that the 'winning-under-assumption' strategies π 0 from Def. 1 satisfy two simple but interesting properties — anti-monotonicity (if π 0 is winning under an assumption, then it is so under every stronger assumption), and conjunctivity (if π 0 is winning under two different assumptions, then it is so under their conjunction). However, it does not satisfy disjunctivity (see [2, Sec. 3.1] for an example). In addition, we remark that the definition of 'winning-underassumption' in terms of plays (rather than strategies) might seem more natural to some readers. We refer these readers to the full version of the paper [2, Sec. 3.1] for an in-depth discussion on the differences of these definitions.

We now see that the assumption Ψ introduced in Def. 1 weakens the strategy choices of the environment player (Player 1). We call assumptions sufficient if this weakening is strong enough to allow Player 0 to win from every vertex in the cooperative winning region.

### Definition 2. An assumption Ψ is sufficient for (G, Φ) if hh0ii<sup>Ψ</sup> Φ ⊇ hh0, 1iiΦ.

Unfortunately, sufficient assumptions can be abused to change the given synthesis problem in an unintended way. Consider for instance the game in Fig. 2 (left) with Φ = ♦{v0} and Ψ = ♦e1. Here, there is no strategy π 1 for Player 1 such that L(π 1 ) ⊆ L(Ψ) as the system can always falsify the assumption by simply not choosing e<sup>1</sup> infinitely often in v1. Therefore, any Player 0 strategy is winning under assumption even if Φ is violated. The assumption Ψ, however, is trivially sufficient, as hh0ii<sup>Ψ</sup> Φ = V . In order to prevent sufficient assumptions to be falsifiable and thereby enabling vacuous winning, we define the notion of implementability, which ensures that Ψ solely restricts Player 1 moves.

Definition 3. An assumption Ψ is implementable for (G, Φ) if hh1iiΨ = V .

Fig. 2: Two-player games with Player 1 (squares) and Player 0 (circles) vertices.

A sufficient and implementable assumption ensures that the cooperative winning region of the original game coincides with the winning region under that assumption, i.e., hh0ii<sup>Ψ</sup> Φ = hh0, 1iiΦ. However, all cooperative strategy choices of both players might still not be retained, which is ensured by the notion of permissiveness.

#### Definition 4. An assumption Ψ is permissive for (G, Φ) if L(Φ) ⊆ L(Ψ).

This notion of permissiveness is motivated by the intended use of assumptions for compositional synthesis. In the simplest scenario of two interacting processes, two synthesis tasks—one for each process—are considered in parallel. Here, generated assumptions in one synthesis task are used as additional specifications in the other synthesis problem. Therefore, permissiveness is crucial to not "skip" over possible cooperative solutions—each synthesis task needs to keep all allowed strategy choices for both players intact to allow for compositional reasoning. This scenario is illustrated in the following example to motivate the considered class of assumptions. Formalizing assumption-based compositional synthesis in general is however out of the scope of this paper.

Example 1. Consider the (non-zerosum) two-player game in Fig. 2 (middle) with two different specifications for both players, namely Φ<sup>0</sup> = ♦{v1, v2} and Φ<sup>1</sup> = ♦{v1}. Now consider two candidate assumptions Ψ<sup>0</sup> = ♦¬e<sup>1</sup> and Ψ 0 <sup>0</sup> = (♦v<sup>1</sup> =⇒ ♦e2) on Player 1. Notice that both assumptions are sufficient and implementable for (G, Φ0). However, Ψ 0 <sup>0</sup> does not allow the play {v1} ω and hence is not permissive whereas Ψ<sup>0</sup> is permissive for (G, Φ0). As a consequence, there is no way Player 1 can satisfy both her objective Φ<sup>1</sup> and the assumption Ψ 0 0 even if Player 0 cooperates, since L(Φ1) ∩ L(Ψ 0 0 ) = ∅. However, under the assumption Ψ<sup>0</sup> on Player 1 and assumption Ψ<sup>1</sup> = ♦¬e<sup>3</sup> on Player 0 (which is sufficient and implementable for (G, Φ1) if we interchange the vertices of the players), they can satisfy both their own objectives and the assumptions on themselves. Therefore, they can collectively satisfy both their objectives.

We also remark that for this example, the algorithm in [9] outputs Ψ 0 <sup>0</sup> as the desired assumption for game (G, Φ0) and their used assumption formalism is not rich enough to capture assumption Ψ0. This shows that the assumption type we are interested in is not computable by the algorithm from [9].

Definition 5. An assumption Ψ is called adequately permissive (an APA for short) for (G, Φ) if it is sufficient, implementable and permissive.

### 4 Computing Adequately Permissive Assumptions (APA)

In this section, we present our algorithm to compute adequately permissive assumptions (APA for short) for parity games, which are canonical representations of ω-regular games. For a gradual exposition of the topic, we first present algorithms for simpler winning conditions, namely safety (Sec. 4.2), Büchi (Sec. 4.3), and Co-Büchi (Sec. 4.4), which are used as building blocks while presenting the algorithm for parity games (Sec. 4.5). All proofs omitted can be found in the full version [2]. Let us first introduce some preliminaries.

#### 4.1 Preliminaries

cpre

We use symbolic fixpoint algorithms expressed in the µ-calculus [17] to compute the winning regions and to generate assumptions in simple post-processing steps. Set Transformers. Let G = (V = V <sup>0</sup> ] V 1 , E) be a game graph, U ⊆ V be a subset of vertices, and a ∈ {0, 1} be the player index. Then we define two types of predecessor operators:

$$\text{pre}\_G(U) = \left\{ v \in V \mid \exists u \in U. \ (v, u) \in E \right\} \tag{1}$$

$$\mathsf{cpre}\_{G}^{a}(U) = \{v \in V^{a} \mid v \in \mathsf{pre}\_{G}(U)\} \cup \{v \in V^{1-a} \mid \forall (v, u) \in E. \, u \in U\} \tag{2}$$

$$\mathbf{f}\_{G\_{\mathbf{i}}}^{a,1}(U) = \mathbf{c}\_{G\_{\mathbf{i}}} \mathbf{c}\_{G} \mathbf{e}\_{G}^{a}(U) \cup U\_{\mathbf{i}} \tag{3}$$

$$\mathsf{cpre}\_{G}^{a,i}(U) = \mathsf{cpre}\_{G}^{a}(\mathsf{cpre}\_{G}^{a,i-1}(U)) \cup \mathsf{cpre}\_{G}^{a,i-1}(U) \text{ with } i \ge 1 \tag{4}$$

The predecessor operator preG(U) computes the set of vertices with at least one successor in U. The controllable predecessor operators cpre<sup>a</sup> <sup>G</sup>(U) and cpre a,i <sup>G</sup> (U) compute the set of vertices from which Player a can force visiting U in at most one and i steps respectively. In the following, we introduce the attractor operator attr<sup>a</sup> <sup>G</sup>(U) that computes the set of vertices from which Player a can force at least a single visit to U in finitely many but nonzero<sup>3</sup> steps:

$$\mathbf{attr}\_G^a(U) = \left(\bigcup\_{i \ge 1} \mathbf{cpre}^{a,i}(U)\right) \backslash U \tag{5}$$

When clear from the context, we drop the subscript G from these operators.

Fixpoint Algorithms in the µ-calculus. µ-calculus [17] offers a succinct representation of symbolic algorithms (i.e., algorithms manipulating sets of vertices instead of individual vertices) over a game graph G. The formulas of the µ-calculus, interpreted over a 2-player game graph G, are given by the grammar

$$\phi := p \mid X \mid \phi \cup \phi \mid \phi \cap \phi \mid pre(\phi) \mid \mu X.\phi \mid \nu X.\phi$$

where p ranges over subsets of V , X ranges over a set of formal variables, pre ranges over monotone set transformers in {pre, cpre<sup>a</sup> , attr<sup>a</sup>}, and µ and ν denote, respectively, the least and the greatest fixed point of the functional defined as X 7→ φ(X). Since the operations ∪, ∩, and the set transformers pre are all monotonic, the fixed points are guaranteed to exist, due to the Knaster-Tarski Theorem [5]. We omit the (standard) semantics of formulas (see [17]).

A µ-calculus formula evaluates to a set of vertices over G, and the set can be computed by induction over the structure of the formula, where the fixed points are evaluated by iteration. The reader may note that pre, cpre and attr can be computed in time polynomial in number of vertices.

<sup>3</sup> In existing literature, usually U ⊆ attr<sup>a</sup> (U), i.e., attr<sup>a</sup> (U) contains vertices from which U is visited in zero steps. We exclude U from attr<sup>a</sup> (U) for a technical reason.

#### 4.2 Safety Games

A safety game is a game G = (G, Φ) with Φ := U for some U ⊆ V , and a play fulfills Φ if it never leaves U. APAs for safety games disallow every Player 1 move that leaves the cooperative winning region in G w.r.t. Safety(U). This is formalized in the following theorem.

Theorem 1. Let G = (G = (V, E), U) be a safety game, Z <sup>∗</sup> = νY.U ∩ pre(Y ), and S = (u, v) ∈ E | u ∈ V <sup>1</sup> ∩ Z ∗ ∧ (v /∈ Z ∗ ) . Then Z <sup>∗</sup> = hh0, 1iiU and <sup>4</sup>

$$
\Psi\_{\text{UNSAFE}}(S) := \Box \bigwedge\_{e \in S} \neg e,\tag{6}
$$

is an APA for the game G. We denote by UnsafeA(G, U) the algorithm computing S as above, which runs in time O(n 2 ), where n = |V |.

We call the LTL formula in (6) a safety template and assumptions that solely use this template safety assumptions.

#### 4.3 Live Group Assumptions for Büchi Games

Büchi games. A Büchi game is a game G = (G, Φ) where Φ = ♦U for some U ⊆ V . Intuitively, a play is winning for a Büchi game if it visits the vertex set U infinitely often. We first recall that the cooperative winning region hh0, 1ii♦U can be computed by a two-nested symbolic fixpoint algorithm [10]

$$\text{Bụcnu}(G, U) := \nu Y . \mu X. \ (U \cap \text{pre}(Y)) \cup (\text{pre}(X)). \tag{7}$$

Live group templates. Given the standard algorithm in (7), the set X<sup>i</sup> computed in the i-th iteration of the fixpoint variable X in the last iteration of Y actually carries a lot of information to construct a very useful assumption for the Büchi game G. To see this, recall that X<sup>i</sup> contains all vertices which have an edge to vertices which can reach U in at most i − 1 steps [10, sec. 3.2]. Hence, for all Player 1 vertices in X<sup>i</sup> \ X<sup>i</sup>−<sup>1</sup> we need to assume that Player 1 always eventually makes progress towards U by moving to X<sup>i</sup> . This can be formalized by a so called live group template.

Definition 6. Let G = (V, E) be a game graph. Then a live group H = {ej}j≥<sup>0</sup> is a set of edges e<sup>j</sup> = (s<sup>j</sup> , t<sup>j</sup> ) with source vertices src(H) := {sj}j≥<sup>0</sup> . Given a set of live groups H` = {Hi}i≥<sup>0</sup> we define a live group template as

$$\Psi\_{\rm LV\overline{\rm c}}(H^{\ell}) := \bigwedge\_{i \ge 0} \Box \Diamond src(H\_i) \implies \Box \Diamond H\_i. \tag{8}$$

<sup>4</sup> We use e = (u, v) in LTL formulas as a syntactic sugar for u ∧ v, where is the LTL next operator. A set of edges E <sup>0</sup> = {ei}i∈[0;k] , when used as atomic proposition, is a syntactic sugar for W i∈[0;k] ei.

The live group template says that if some vertex from the source of a live group is visited infinitely often, then some edge from this group should be taken infinitely often. We will use this template to give the assumptions for Büchi games.

Remark 1. Note that the assumptions computed by Chatterjee et al. [8] uses live edges, i.e., singleton live groups, and hence, they are less expressive. In particular, there are instances of Büchi games, where the permissive assumptions can not be expressed using live edges but they can be using live groups, e.g., in Fig. 1 (c) the live edge assumption ♦e<sup>1</sup> ∧ ♦e<sup>2</sup> is sufficient but not permissive, whereas the live group assumption ♦src(H) =⇒ ♦H with H = {e1, e2} is an APA.

In the context of the fixpoint computation of (7), we can construct live groups H` = {Hi}i≥<sup>0</sup> where each H<sup>i</sup> contains all edges of Player 1 which originate in X<sup>i</sup> \ Xi−<sup>1</sup> and end in Xi−<sup>1</sup> . Then the live group assumption in (8) precisely captures the intuition that, in order to visit U infinitely often, Player 1 should take edges in H<sup>i</sup> infinitely often if vertices in src(Hi) are seen infinitely often. Unfortunately, it turns out that this live group assumption is not permissive. The reason is that it restricts Player 1 also on those vertices from which she will anyway go towards U. For example, consider the game in Fig. 2 (right). Here defining live groups through computations of (10), will mark e<sup>1</sup> as a live group, but then (v2v1v0) <sup>ω</sup> will be in L(Φ) but not in the language of the assumption. Here the permissive assumption would be Ψ = true.

Accelerated fixpoint computation. In order to provide permissiveness, we use a slightly modified fixpoint algorithm that computes the same set Z <sup>∗</sup> but allows us to extract permissive assumptions directly from the fixpoint computations. Towards this goal, we introduce the together predecessor operator.

$$\mathsf{trpre}\_G(U) = \mathsf{attr}\_G^0(U) \cup \mathsf{cpre}\_G^1(\mathsf{attr}\_G^0(U) \cup U). \tag{9}$$

Intuitively, tpre adds all vertices from which Player 0 does not need any cooperation to reach U in every iteration of the fixpoint computation. The interesting observation we make is that substituting the inner pre operator in (7) by tpre does not change the computed set but only accelerates the computation. This is formalized in the next proposition and visualized in Fig. 3.

Proposition 1. Let G = (G, ♦U) be a Buchi ¨ game and

$$\text{TBUCH}(G, U) = \nu Y. \mu X. \ (U \cap \text{pre}(Y)) \cup (\text{tpre}(X)). \tag{10}$$

Then TBüchi(G, U) = Büchi(G, U) = hh0, 1ii♦U.

Prop. 1 follows from the correctness proof of (7) by using the observation that for all U ⊆ V we have µX. U ∪ pre(X) = µX. U ∪ tpre(X).

Computing live group assumptions. Intuitively, the operator tpre<sup>G</sup> computes the union of (i) the set of vertices from which Player 0 can reach U in a finite number of steps with no cooperation from Player 1 and (ii) the set of Player 1 vertices from which Player 0 can reach U with at most one-time cooperation from Player 1. Looking at Fig. 3, case (i) is indicated by the dotted line,

Fig. 3: Computation of µX. U∪pre(X) (left) and µX. U∪tpre(X) (right). Each colored region describes one iteration over X. The dotted region on the right is added by the attr part of tpre, and this allows only the vertex v<sup>5</sup> to be in front({v1}). Each set of the same colored edges defines a live transition group.

while case (ii) corresponds to the last added Player 1 vertex (e.g., v5). Hence, we need to capture the cooperation needed by Player 1 only from the vertices added last, which we call the frontier of U in G and are formalized as follows:

$$\text{front}(U) \coloneqq \text{type}\_G(U) \backslash \text{attr}^0\_G(U). \tag{11}$$

It is easy to see that, indeed front(U) ⊆ V 1 , as whenever v ∈ front(U) ∩ V 0 , then it would have been the case that v ∈ attr<sup>0</sup> <sup>G</sup>(U) via (10).

Defining live groups based on frontiers instead of all elements in X<sup>i</sup> indeed yields the desired permissive assumption for Büchi games. By observing that we additionally need to ensure that Player 1 never leaves the cooperative winning region by a simple safety assumption, we get the following result, which is the main contribution of this section.

Theorem 2. Let G = (G = (V, E), Φ = ♦U) be a Buchi ¨ game with Z <sup>∗</sup> = TBüchi(G, U) and <sup>H</sup>` <sup>=</sup> {Hi}i≥<sup>0</sup> s.t.

$$\emptyset \neq H\_i := (front(X^i) \times (X^{i+1} \mid front(X^i))) \cap E,\tag{12}$$

where X<sup>i</sup> is the set computed in the i-th iteration of the computation over X and in the last iteration of the computation over Y in TBüchi. Then Ψ = Ψunsafe(S) ∧ Ψlive(H` ) is an APA for G, where S = UnsafeA(G, U). We write LiveA(G, U) to denote the algorithm to construct live groups H` as above, which runs in time O(n 3 ), where n = |V |.

In fact, there is a faster algorithm for computation of APAs for Büchi games, that runs in time linear in the size of the graph, which we present in the full version [2]. We chose to present the µ-calculus based algorithm here, because it provides more insights into the nature of live groups.

#### 4.4 Co-Liveness Assumptions in Co-Büchi Games

A co-Büchi game is the dual of a Büchi game, where a winning play should visit a designated set of vertices only finitely many times. Formally, a co-Büchi game is a tuple G = (G, Φ) where Φ = ♦U for some U ⊆ V . The standard symbolic algorithm to compute the cooperative winning region is as follows:

$$\text{CoBf} \text{CH}(G, U) := \mu X. \nu Y. \ (U \cap \text{pre}(Y)) \cup (\text{pre}(X)). \tag{13}$$

As before, the sets X<sup>i</sup> obtained in the i-th computation of X during the evaluation of (13) carry essential information for constructing assumptions. Intuitively, X<sup>1</sup> gives precisely the set of vertices from which the play can stay in U with Player 1's cooperation and we would like an assumption to capture the fact that we do not want Player 1 to go further away from X<sup>1</sup> infinitely often. This observation is naturally described by so called co-liveness templates.

Definition 7. Let G = (V, E) be a game graph and D ⊆ V × V a set of edges. Then a co-liveness template over G w.r.t. D is defined by the LTL formula

$$
\Psi\_{\text{COLVE}}(D) := \lozenge \Box \bigwedge\_{e \in D} \neg e. \tag{14}
$$

The assumptions employing co-liveness templates will be called co-liveness assumptions. With this, we can state the main result of this section.

$$\text{Theorem 3. } Let \,\mathcal{G} = (G = (V, E), \diamondsuit \Box U), \, Z^\* = \text{CoB\"{o}c} \text{H} (G, U) \,\, and \,\,^2$$

$$D = \left( \left[ (X^1 \cap V^1) \times (Z^\* \backslash X^1) \right] \right.\\ \left. \cup \left[ \bigcup\_{i>1} (X^i \cap V^1) \times (Z^\* \backslash X^{i-1}) \right] \right) \cap E,\tag{15}$$

where X<sup>i</sup> is the set computed in the i-th iteration of fixpoint variable X in CoBüchi. Then Ψ = Ψunsafe(S) ∧ Ψcolive(D) is an APA for G, where S = UnsafeA(G, U). We write CoLiveA(G, U) to denote the algorithm constructing co-live edges D as above which runs in time O(n 3 ), where n = |V |.

We observe that X<sup>1</sup> is a subset of U such that if a play reaches X<sup>1</sup> , Player 0 and Player 1 can cooperatively keep the play in X<sup>1</sup> . To do so, we ensure via the definition of D in (15) that Player 1 can only leave X<sup>1</sup> finitely often. Moreover, with the other co-live edges in D, we ensure that Player 1 can only go away from X<sup>1</sup> finitely often, and hence if Player 0 plays their strategy to reach X<sup>1</sup> and then stay there, the play will be winning. The permissiveness of the assumption comes from the observation that if co-liveness is violated, then Player 1 takes a co-live edge infinitely often, and hence leaves X<sup>1</sup> infinitely often, implying leaving U infinitely often.

We again present a faster algorithm that runs in time linear in size of the graph for computation of APAs for co-Büchi games in the full version [2].

#### 4.5 APA Assumptions for Parity Games

Parity games. Let G = (V, E) be a game graph, and C = {C0, . . . , Ck} be a set of subsets of vertices which form a partition of V . Then the game G = (G, Φ) is called a parity game if

$$\Phi = Parity(C) := \bigvee\_{i \in \text{odd}[0;k]} \Box \Diamond C\_i \implies \bigvee\_{j \in \text{even}[i+1;k]} \Box \Diamond C\_j. \tag{16}$$

The set C is called the priority set and a vertex v in the set C<sup>i</sup> , for i ∈ [1; k], is said to have priority i. An infinite play ρ is winning for Φ = Parity(C) if the highest priority appearing infinitely often along ρ is even.

Conditional live group templates. As seen in the previous sections, for games with simple winning conditions which require visiting a fixed set of edges infinitely or only finitely often, a single assumption (conjoined with a simple safety assumption) suffices to characterize APAs, as there is just one way to win. However, in general parity games, there are usually multiple ways of winning: for example, in parity games with priorities {0, 1, 2}, a play will be winning if either (i) it only infinitely often sees vertices of priority 0, or (ii) it sees priority 1 infinitely often but also sees priority 2 infinitely often. Intuitively, winning option (i) requires the use of co-liveness assumptions as in Sec. 4.4. However, winning option (ii) actually requires the live group assumptions discussed in Sec. 4.3 to be conditional on whether certain states with priority 1 have actually been visited infinitely often. This is formalized by generalizing live group templates to conditional live group templates.

Definition 8. Let G = (V, E) be a game graph. Then a conditional live group over G is a pair (R, H` ), where R ⊆ V and H` is a live group. Given a set of conditional live groups H` , a conditional live group template is the LTL formula

$$\Psi\_{\rm{coND}}(\mathcal{H}^{\ell}) := \bigwedge\_{\{R, H^{\ell}\} \in \mathcal{H}^{\ell}} \left( \Box \Diamond R \implies \Psi\_{\rm{LVE}}(H^{\ell}) \right) . \tag{17}$$

Again, the assumptions employing conditional live group templates will be called conditional live group assumptions. With the generalization of live group assumptions to conditional live group assumptions, we actually have all the ingredients to define an APA for parity games as a conjunction

$$
\Psi = \Psi\_{\text{UNSAFE}}(S) \wedge \Psi\_{\text{COLIVE}}(D) \wedge \Psi\_{\text{COND}}(\mathcal{H}^\ell) \tag{18}
$$

of a safety, a co-liveness, and a conditional live group assumptions. Intuitively, we use (i) a safety assumption to prevent Player 1 to leave the cooperative winning region, (ii) a co-live assumption for each winning option that requires seeing a particular odd priority only finitely often, and (iii) a conditional live group assumption for each winning option that requires seeing an even priority infinitely often if certain odd priority have been seen infinitely often. The remainder of this section gives an algorithm (Alg. 1) to compute the actual safety, co-live and conditional live group sets S, D and H` , respectively, and proves that the resulting assumption Ψ (as in (18)) is actually an APA for the parity game G.

Computing APAs. The computation of unsafe, co-live, and conditional live group sets S, D, and H` to make Ψ in (18) an APA is formalized in Alg. 1. Alg. 1 utilizes the standard fixpoint algorithm Parity(G, C) [12] to compute the cooperative winning region for a parity game G, defined as

$$\text{PARITY}(G, C) := \tau X\_d \cdot \dots \cdot \nu X\_2$$

$$\text{ } \{ \text{ } \color[rgb]{0,0,1} \dots \color[rgb]{0,0,1} \nu X\_2 \text{ } \mu X\_1 \text{ } \nu X\_0 \text{ } \bigcup\_{i \in [0;d]} (C\_i \cap \text{pre}(X\_i)), \tag{19}$$

where τ is ν if d is even, and µ otherwise. In addition, Alg. 1 involves the algorithms UnsafeA (Thm. 1), LiveA (Thm. 2), and CoLiveA (Thm. 3) to

```
Algorithm 1 ParityAssumption
```

```
Input: G = (V, E) , C : V → {0, 1, . . .}
Output: Ψ
1: Z
    ∗ ← Parity(G, C)
2: S ← UnsafeA(G, Z∗
                     )
3: G ← G|Z∗ , C ← C|Z∗
4: (D, H`
         ) ←ComputeSets((G, C), ∅, ∅)
5: return S, D, H`
6: procedure ComputeSets((G, C), D, H`
                                     )
7: d ← max{i | Ci 6= ∅}
8: if d is odd then
9: W¬d ← Parity(G|V \Cd
                             , C)
10: D ← D ∪ CoLiveA(G, W¬d)
11: else
12: Wd ← Büchi(G, Cd), W¬d ← V \ Wd
13: for all odd i ∈ [0; d] do
14: H` ← H` ∪ (Wd ∩ Ci, LiveA(G|Wd
                                          , Ci+1 ∪ Ci+3 · · · ∪ Cd))
15: if d > 0 then
16: G ← G|W¬d
                    , C0 ← C0 ∪ Cd, Cd ← ∅
17: ComputeSets((G, C), D, H`
                                 )
18: else
19: return (D, H`
                      )
```
Fig. 4: A parity game, where a vertex with priority i has label ci. The dotted edges are the unsafe edges, the dashed edges are the co-live edges, and every similarly colored vertex-edge pair forms a conditional live group.

compute safety, live group, and co-liveness assumptions in an iterative manner. In addition, G|<sup>U</sup> := U, U<sup>0</sup> , U<sup>1</sup> , E<sup>0</sup> s.t. U 0 := V <sup>0</sup> ∩ U, U 1 := V <sup>1</sup> ∩ U, and E0 := E ∩ (U × U) denotes the restriction of a game graph G := V, V <sup>0</sup> , V <sup>1</sup> , E to a subset of its vertices U ⊆ V . Further, C|<sup>U</sup> denotes the restriction of the priority set C from V to U ⊆ V .

We illustrate the steps of Alg. 1 by an example depicted in Fig. 4. In line 1, we compute the cooperative winning region Z <sup>∗</sup> of the entire game, to find that the parity condition cannot be satisfied from vertex v<sup>7</sup> even with cooperation, i.e., Z <sup>∗</sup> = {v1, . . . , v6}. So we put the edge (v6, v7) in a safety template, restrict the game to G = G|Z<sup>∗</sup> and run ComputeSets on the new restricted game.

In the new game G the highest priority is odd (d = 5), hence we execute lines 9-10. Now a play would be winning only if eventually the play does not see v<sup>5</sup> any more. Hence, in step 9, we find the region W<sup>¬</sup><sup>5</sup> = {v1, . . . , v4, v6} of the restricted graph G|<sup>V</sup> \C<sup>5</sup> (only containing nodes v<sup>i</sup> with priority C(vi) < 5)) from where we can satisfy the parity condition without seeing v5. We then make sure that we do not leave W<sup>¬</sup><sup>5</sup> to visit v<sup>5</sup> in the game G infinitely often by executing CoLiveA(G, W<sup>¬</sup>5) in line 10, making the edges (v5, v5) and (v6, v5) co-live.

Once we restrict a play from visiting v<sup>5</sup> infinitely often, we only need to focus on satisfying parity without visiting v<sup>5</sup> within W<sup>¬</sup>5. This observation allows us to further restrict our computation to the game G = G|<sup>W</sup>¬<sup>5</sup> in line 16, where we also update the priorities to only range from 0 to 4. In our example this step does not change anything. We then re-execute ComputeSets on this game.

In the restricted graph, the highest priority is 4 which is even, hence we execute lines 12-14. One way of winning in this game is to visit C<sup>4</sup> infinitely often, so we compute the respective cooperative winning region W<sup>4</sup> in line 12. In our example we have W<sup>4</sup> = W¬<sup>5</sup> = {v1, . . . , v4, v6}. Now, to ensure that from the vertices from which we can cooperatively see 4, we actually win, we have to make sure that every time a lower odd priority vertex is visited infinitely often, a higher priority is also visited. This can be ensured by conditional live group fairness as computed in line 14. For every odd priority i < 4, (i.e, for i = 1 and i = 3) we have to make sure that either 2 or 4 (if i = 1) or 4 (if i = 3) is visited infinitely often. The resulting live groups H` <sup>i</sup> = (R<sup>i</sup> , H` i ) collect all vertices in W<sup>4</sup> with priority i in R<sup>i</sup> and all live groups allowing to see even priorities j with i < j ≤ 4 in H` i , where the latter is computed using the fixedpoint algorithm LiveA to compute live groups. The resulting live groups for i = 1 (blue) and i = 3 (red) are depicted in Fig. 4 and given by ({v1}, {(v1, v2)}) and ({v3}, {(v2, v4)}, {(v1, v2)}), respectively.

At this point we have W<sup>¬</sup><sup>4</sup> = ∅, making the game graph computed in line 16 empty, and the algorithm eventually terminates after iteratively removing all priorities from C by running ComputeSets (without any computations, as G is empty) for priorities 3, 2 and 1. In a different game graph, the reasoning done for priorities 5 and 4 above can be repeated for lower priorities if there are other parts of the game graph not contained in W4, from where the game can be won by seeing priority 2 infinitely often. The main insight into the correctness of the outlined algorithm is that all computed assumptions can be conjoined to obtain an APA for the original parity game.

With Alg. 1 in place, we now state the main result of the entire paper.

Theorem 4. Let G = (G,Parity(C)) be a parity game such that (S, D, H` ) = ParityAssumption(G, C). Then Ψ = Ψunsafe(S) ∧ Ψcolive(D) ∧ Ψcond(H` ) is an APA for G. Moreover, Alg. 1 terminates in time O(n 4 ), where n = |V |.

#### 5 Experimental Evaluation

We have developed a C++-based prototype tool SImPA<sup>5</sup> computing Sufficient, Implementable and Permissive Assumptions for Büchi, co-Büchi, and parity games. We first compare SImPA against the closest related tool GIST [9] in Sec. 5.1. We then show that SImPA gives small and meaningful assumptions for the well-known 2-client arbiter synthesis problem from [21] in Sec. 5.2.

<sup>5</sup> Repository URL: https://gitlab.mpi-sws.org/kmallik/simpa

Fig. 5: Running times of SImPA vs GIST (in seconds, log-scale)


Table 1: Summary of the experimental results

#### 5.1 Performance Evaluation

We compare the effectiveness of our tool against a re-implementation of GIST [9], which is not available anymore <sup>6</sup> . GIST originally computes assumptions only enabling a particular initial vertex to become winning for Player 0. However, for the experiments, we run GIST until one of the cooperatively winning vertices is not winning anymore. Since GIST starts with a maximal assumption and shrinks it until a fixed initial vertex is not winning anymore, our modification makes GIST faster as the modified termination condition is satisfied earlier. Owing to the non-dependence of our tool and dependence of GIST on a fixed vertex, this modification allows a fair comparison.

We compared the performance and the quality of the assumptions computed by SImPA and GIST on a set of parity games collected from the SYNTCOMP benchmark suite [1], with a timeout of one hour per game. All the experiments were performed on a computer equipped with Intel(R) Core(TM) i5-10600T CPU @ 2.40GHz and 32 GiB RAM.

We provide all details of the experimental results in the full version [2] and summarize them in Table 1. In addition, Fig. 5 shows a scatter plot, where every instance of the benchmarks is depicted as a point, where the X and the Y coordinates represent the running time for SImPA and GIST (in seconds), respectively. We see that SImPA is computationally much faster than GIST in every instance (all dots lie above the lower red line) – most times by one (above the middle green line) and many times even by two (above the upper orange line) orders of magnitude.

Moreover, in some experiments, GIST fails to compute a sufficient assumption (in the sense of Def. 2), whereas SImPA successfully computes an APA (see the row labeled 'no assumption generated' in Table 1). This is not surprising, as the class of assumptions used by GIST are only unsafe edges and live edges (i.e., singleton live groups) which are not expressive enough to provide sufficient assumptions for all parity games (see Fig. 1(b) for a simple example where there is no sufficient assumption that can be expressed using live edges). Furthermore,

<sup>6</sup> The link provided in the paper is broken, and the authors informed us that the implementation is not available.

Fig. 6: Illustration of a relevant part of the game graph for the 2-client arbiter. Rectangles and circles represent Player 1 and Player 0 vertices, respectively. The labels of the Player 0 states indicate the current status of the request and grant bits, and in addition, remember if a request is currently pending using the atomic propositions F1, F2. The double-lined vertices are Büchi vertices, i.e., ones with no pending requests.

we note that in all cases where the assumptions computed by GIST are actually APAs, SImPA computes the same assumptions orders of magnitudes faster.

#### 5.2 2-Client Arbiter Example

We consider the 2-client arbiter example from the work by Piterman et al. [21], where clients i ∈ {1, 2} (Player 1) can request or free a shared resource by setting the input variables r<sup>i</sup> to true or false, and the arbiter (Player 0) can set the output variables g<sup>i</sup> to true or false to grant or withdraw the shared resource to/from client i. The game graph for this example is implicitly given as part of the specification (as this is a GR(1) synthesis problem [21]). The goal of the arbiter is to ensure that always eventually the requests are granted. This can be depicted by a Büchi game, part of which is presented in Fig. 6. It is known that Player 0 can not win the game without constraining moves of Player 1.

Running SImPA (took 0.01s) on this example yields two live groups (edges of one live group are indicated by thick red arrows in Fig. 6) that ensures that the play eventually moves to vertices where the Player 0 can force a visit to a Büchi vertex. These assumptions are similar to the ones used to restrict the clients' behavior in [21], but are more permissive. Furthermore, running GIST (took 6.44s) yields several live edges (e.g., <sup>2</sup> − <sup>3</sup> , <sup>7</sup> − <sup>1</sup> ), which again is less permissive than ours. It turns out that an APA for this example will unavoidably require live groups — singleton live edges, as computed by GIST, will not suffice. For a detailed discussion, we refer the reader to the full version [2].

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Verification-guided Programmatic Controller Synthesis

Yuning Wang and He Zhu()

Rutgers University, New Brunswick NJ, USA {yw895,hz375}@cs.rutgers.edu

Abstract. We present a verification-based learning framework VEL that synthesizes safe programmatic controllers for environments with continuous state and action spaces. The key idea is the integration of program reasoning techniques into controller training loops. VEL performs abstraction-based program verification to reason about a programmatic controller and its environment as a closed-loop system. Based on a novel verification-guided synthesis loop for training, VEL minimizes the amount of safety violation in the proof space of the system, which approximates the worst-case safety loss, using gradient-descent style optimization. Experimental results demonstrate the substantial benefits of leveraging verification feedback for synthesizing provably correct controllers.

### 1 Introduction

Controller search is commonly used to govern cyber-physical systems such as autonomous vehicles, where high assurance is particularly important. Reinforcement Learning (RL) of neural network controllers is a promising approach for controller search [19]. State-of-the-art RL algorithms can learn motor skills autonomously through trial and error in simulated or even unknown environments, thus avoiding tedious manual engineering. However, well-trained neural network controllers may still be unsafe since the RL algorithms do not provide any formal guarantees on safety. A learned controller may fail occasionally but catastrophically, and debugging these failures can be challenging [46].

Guaranteeing the correctness of an RL controller is therefore important. Principally, given an environment model, the correctness of a controller can be verified by reachability analysis over a closed-loop system that combines the environment model and the controller. Indeed, the use of formal verification techniques to aid the design of reliable learning-enabled autonomous systems has risen rapidly over the last few years [43,28,41,18,17]. A natural extended question is that in case verification fails, can we exploit verification feedback in the form of counterexamples to synthesize a verifiably correct controller? This turns out to be a very challenging task due to the following reasons.

Verification Scalability. A counterexample-guided controller synthesizer has to iteratively conduct reachability analysis and controller optimization as each

if 28.33x<sup>1</sup> + 4.23x<sup>2</sup> + 4.16 ≥ 0 then 6.79x<sup>1</sup> − 8.56x<sup>2</sup> + 0.35 else 11.01x<sup>1</sup> − 13.50x<sup>2</sup> + 8.71

(a) Oscillator Programmatic Controller (b) Oscillator Reachability Analysis

Fig. 1: An oscillator programmatic controller and its reachability analysis. In Fig. 1b, the red region represents the oscillator unsafe set (−03, −0.25) × (0.2, 0.35), and the blue region depicts the target set [−0.05, 0.05]×[−0.05, 0.05]. The initial state set of oscillator is [−0.51, −0.49] × [0.49, 0.51].

iteration may discover a new counterexample. However, repeatedly calculating the reachable set of a nonlinear system controlled by a neural network controller over a long horizon is computationally challenging. For example, consider designing a controller for the Van der Pol's oscillator system [49]. The oscillator is a 2-dimensional non-linear system whose state transition can be expressed by the following ordinary differential equations:

$$
\dot{x\_1} = x\_2 \qquad \dot{x\_2} = (1 - x\_1^2)x\_2 - x\_1 + u \tag{1}
$$

where (x1, x2) is the system state variables and u is the control action variable. A feedback controller π(x1, x2) measures the current system state and then manipulates the control input u as needed to drive the system toward its target. The initial set of the control system is (x1, x2) ∈ [−0.51, −0.49]×[0.49, 0.51]. As depicted in Fig. 1b, the controlled system is expected to reach the target region in blue while avoiding the obstacle region in red within 120 timesteps (i.e. control steps). In our experience, even for this simple example, using Verisig [28] and ReachNN<sup>∗</sup> [18] (two state-of-the-art verification tools for neural network controlled systems) to calculate the reachable set of a simple 2-layer neural network feedback controller πNN(x1, x2) costs more than 100s each. It is even more a costly process to repeatedly conduct reachability analysis of a complex neural network controller in a counterexample-guided learning loop.

Recently, programmatic controllers emerge as a promising solution to address the lack of interpretability problem in deep reinforcement learning [47,27,44,38] by training controllers as programs. A programmatic controller to control the oscillator environment learned by a programmatic reinforcement learning algorithm [38] is depicted in Fig. 1a. We depict the decision boundary of the program's conditional statement (28.33x<sup>1</sup> + 4.23x<sup>2</sup> + 4.16 = 0) in solid dash in Fig. 1b. The program can be interpreted as a decomposition of the reachavoid learning problem into two sub-problems — the linear controller in the else branch of the program first pushes the system away from the obstacle and next the linear controller in the then branch takes over to make the system reach the target. As we show in this paper, the compact and structured representation of a programmatic controller lends itself amenable to off-the-shelf hybrid or continuous system reachability tools e.g. [10,20]. Compared with verifying a deep neural network controller, reasoning about a programmatic controller is more feasible. However, the question remains when verification fails – rather than retraining a new controller, how can we leverage verification feedback to construct a verifiably correct controller?

Proof Space Optimization. The other main challenge of verification-guided controller synthesis is that when verification fails, the counterexample path may provide little help or even be spurious due to estimated approximation errors. This is because reachability analyses typically overapproximate the true reachable sets using a computationally convenient representation such as polytopes [20] or Taylor models [10]. This overapproximation leads to quick error accumulation over time, known as the wrapping effect. Even a well-trained controller may fail verification because of approximation errors. For example, we adapted a state-of-the-art reachability analyzer Flow<sup>∗</sup> [10] to conduct reachability analysis of the closed-loop system combined by the programmatic controller in Fig. 1a and the oscillator environment (Equation 1) to compute a reachable state set between each time interval within the episode horizon (the controller is applied to generate a control action at the start of each time interval). The result is depicted in Fig. 1b. Although the programmatic controller empirically succeeds reaching the goal on extensive test simulations, the reachability analysis cannot determine whether the target region can always be reached as it computes a larger reachable region that keeps expansion, which may be an overestimation caused by over-approximation.

We hypothesize that verification failures can be caused by (1) true counterexample of unsafe states, (2) states caused by approximate errors, and (3) states in between the time interval of each control step (RL algorithms only sample states at the start and the end of a time interval). The latter two kinds of states cannot be observed by an RL algorithm during training in the concrete system state space. Thus, counterexample-guided controller synthesis may not work well if counterexamples are in the form of paths within the concrete state space.

To address this challenge, we propose synthesizing controllers in the proof space of a reachability analyzer. Controller synthesis in the proof space is critical to learning a verified controller because it can leverage verification feedback on either true unsafe counterexample states or approximation errors introduced by the verification procedure for searching a provably correct controller. A counterexample detected by a reachability analyzer is a symbolic rollout of abstract states of the closed-loop system that combines a (fixed) environment model and a (parameterized) programmatic controller. An abstract state (e.g. depicted as a green region in Fig. 1b) at a timestep over-approximates the set of concrete states reachable during the time interval of the timestep. VEL quantifies the safety and reachability property violation by the abstract states, e.g. there is an abstract loss between the approximative abstract state and the target region at the last control step. The loss approximates the worst-case reachability loss of any concrete state subsumed by the abstraction. We introduce lightweight gradient-descent style optimization algorithms to optimize controller parameters to effectively minimize the amount of correctness property violation to zero to refute any verification counterexamples.

Contributions. The main contribution of this paper is twofold. First, we present an efficient controller synthesis approach that integrates formal verification within a programmatic controller learning loop. Second, instead of synthesizing a programmatic controller from concrete state and action samples, we optimize the controller using symbolic rollouts with abstract states obtained by reachability analysis in the verification proof space. We implement the proposed ideas in a tool called VEL and present a detailed experimental study over a range of reinforcement learning systems. Our experiments demonstrate the benefits of integrating formal verification as part of the training objective and using verification feedback for controller synthesis.

### 2 Problem Setup

Environment Models. An environment is a structure M<sup>δ</sup> [·] = (S, A, F : {S × A → S}, R : {S × A → R}, ·) where S is an infinite set of continuous real-vector environment states which are valuations of the state variables x1, x2, . . . , x<sup>n</sup> of dimension n (S ⊆ R <sup>n</sup>); and A is a set of continuous real-vector control actions which are valuations of the action variables u1, u2, . . . , u<sup>m</sup> of dimension m. F is a state transition function that emits the next environment state given a current state s and an agent action a. We assume that F is defined by an ordinary differential equation (ODE) in the form of ˙x = f(x, u) and the function f : R <sup>m</sup>×R <sup>n</sup> → R <sup>m</sup> is Lipschitz continuous in x and continuous in u. R(s, a) is the immediate reward after transition from an environment state s ∈ S with action a ∈ A. An environment M<sup>δ</sup> [·] is parameterized with an (unknown) controller.

Controllers. An agent uses a controller to interact with an environment M<sup>δ</sup> [·]. We explicitly model the deployment of a (learned) controller π : {S → A} in M<sup>δ</sup> [·] as a closed-loop system M<sup>δ</sup> [π]. The controller π determines which action the agent ought to take in a given environment state. Specifically, it is invoked every δ time period at a timestep. π reads the environment state s<sup>i</sup> = s(iδ) at time t = iδ (i = 0, 1, 2, . . .) or timestep i, and computes a control action as a<sup>i</sup> = a(iδ) = π(s(iδ)). Then the environment evolves following the ODE x˙ = f(x, a(iδ)) within the time period [iδ,(i + 1)δ] and obtain the state si+1 = s((i + 1)δ) at the next timestep i + 1. In the oscillator example from Sec. 1, the duration δ of a timestep is 0.05s and the time horizon is 6s (i.e. 120 timesteps).

For environment simulation, given a set of initial states S0, we assume the existence of a flow function<sup>1</sup> φ(s0, t) : S<sup>0</sup> × R <sup>+</sup> → S that maps some initial state s<sup>0</sup> to the environment state φ(s0, t) at time t where φ(s0, 0) = s0. We note that φ is the solution of the ODE ˙x = f(x, a(iδ)) in the state transition function F during the time period [iδ,(i + 1)δ] and a(iδ) = π(φ(s0, iδ)).

<sup>1</sup> φ may be implemented using scipy.integrate.odeint (or scipy.integrate.solve ivp).

Reinforcement Learning (RL). Given a set of initial states S<sup>0</sup> and a time horizon T δ (T > 0) with δ as the duration of a timestep, a T-timestep rollout ζ of a controller π is denoted as (ζ = s0, a0, s1, . . . , s<sup>T</sup> ) ∼ π where s<sup>i</sup> = s(iδ) and a<sup>i</sup> = a(iδ) are the environment state and the action taken at timestep i such that s<sup>0</sup> ∈ S0, si+1 = F(s<sup>i</sup> , ai), and a<sup>i</sup> = π(si). The aggregate reward of π is

$$J^R(\pi) = \mathbb{E}\_{\left(\zeta = s\_0, a\_0, \dots, s\_T\right) \sim \pi} \Big| \sum\_{t=0}^T \beta^t R(s\_i, a\_i) \Big| \tag{2}$$

where β is the reward discount factor (0 < β ≤ 1). Controller search via RL aims to produce a controller π that maximizes J <sup>R</sup>(π).

Controller Correctness Specification. A correctness specification of a controller is a logical formula specifying whether any rollout ζ of the controller accomplishes the task without violating safety properties and reachability properties. To define safety and reachability over rollouts, the user first specifies a set of atomic predicates over environment states s.

Definition 1 (Predicates). A predicate ϕ is a quantifier-free Boolean combinations of linear inequalities over the environment state variables x:


A state s ∈ S satisfies a preciate ϕ, denoted as s |= ϕ, iff ϕ(s) is true.

The correctness requirement of a controller goes beyond from predicates over environment states s to specifications over controller rollouts ζ.

Definition 2 (Rollout Specifications). The syntax of our correctness specifications for RL controllers is defined as:

$$
\psi ::= \varphi\_I \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel{\quad}{\;} \stackrel$$

In a rollout specification, ϕ<sup>I</sup> reach ϕ<sup>1</sup> enforces reachability - the controlled agent should eventually reach some goal states evaluated true by the predicate ϕ<sup>1</sup> from an initial state that satisfies ϕI. For instance, the agent should achieve some goals from an initial state. The constraint ensuring ϕ<sup>2</sup> additionally enforces safety any rollout of the controller should only visit safe states evaluated true by the predicate ϕ2. For example, the agent should remain within a safety boundary or avoid any obstacles throughout a rollout. Formally, the semantics of a rollout specification ψ is defined as follows:

$$\begin{aligned} \left[ \left[ \varphi\_I \stackrel{\textstyle \mbox{\${}\_{\mathit{C}}} \mathsf{reach} \ \varphi\_1 \ \mathsf{ensur} \mathsf{ing} \ \varphi\_2 \right] (\zeta\_{0:T}) \right] &= \varphi\_1(s\_T) \ \wedge \ \left( \forall \ 0 \le i \le T. \ \varphi\_2(s\_i) \right) \end{aligned}$$

where ζ0:<sup>T</sup> = s0, s1, . . . , s<sup>T</sup> is a rollout such that s<sup>0</sup> ∈ ϕ<sup>I</sup> and T > 0 denotes the total number of timesteps. Our specification implicitly requires that if the target region is reached before the T timestep of a rollout, the controlled agent does not leave the target region at the end of the rollout.

Given a time horizon T δ (T > 0), a controller π is correct for an environment M<sup>δ</sup> [·] with respect to a rollout specification ψ ::= ϕ<sup>I</sup> reach ϕ<sup>1</sup> ensuring ϕ<sup>2</sup> iff for any rollout ζ0:<sup>T</sup> = s0, s1, . . . s<sup>T</sup> <sup>−</sup>1, s<sup>T</sup> of M<sup>δ</sup> [π] such that <sup>ϕ</sup>I(s0) holds, <sup>J</sup>ψK(ζ0:<sup>T</sup> ) is true. Notice that this definition does not consider any states of the continuous environment occurring within the time period of a timestep.

Example 1 Continue the oscillator example. Assume an oscillator initial state is from x1, x<sup>2</sup> ∈ [−0.51, −0.49] × [0.49, 0.51]. Specify the initial state constraint:

$$
\varphi\_I(x\_1, x\_2) \equiv -0.51 \le x\_1 \le -0.49 \land 0.49 \le x\_2 \le 0.511
$$

The unsafe set of oscillator is (−03, −0.25) × (0.2, 0.35) (depicted as the red region in Fig. 1b). The safety ϕsafe of the system is specified as:

$$
\varphi\_{safe}(x\_1, x\_2) \equiv x\_1 \le -0.3 \lor x\_1 \ge -0.25 \lor x\_2 \le 0.2 \lor x\_2 \ge 0.35
$$

For this example, the target region is [−0.05, 0.05]×[−0.05, 0.05] (the blue region in Fig. 1b). The reachability of the system ϕreach is specified as:

$$
\varphi\_{reach}(x\_1, x\_2) \equiv -0.05 \le x\_1 \le 0.05 \land -0.05 \le x\_2 \le 0.05
$$

The target region should be eventually reached by the end of a control episode while avoiding the unsafe state region. We express the rollout specification as:

 $\varphi\_I(x\_1, x\_2)$   $\mathsf{read}$   $\varphi\_{\text{reach}}(x\_1, x\_2)$   $\mathsf{ensuriring}$   $\varphi\_{\text{safe}}(x\_1, x\_2)$ 

The following specification formulates that a desired controller stabilizes the oscillator around the target region over an infinite time horizon:

ϕreach(x1, x2) reach ϕreach(x1, x2) ensuring ϕsafe(x1, x2)

#### 3 Programmatic Controllers

Programmatic controllers have emerged as a promising solution to address the lack of interpretability in deep reinforcement learning [47,38,27,8] by learning controllers as programs. This paper focuses on programmatic controllers structured as differentiable programs [38].

Our programmatic controllers follow the high-level context-free grammar depicted in Fig. 2 where E is the start symbol, θ represents real-valued parameters of the program. The nonterminals E and B stand for program expressions that evaluate to action values in R <sup>m</sup> and Booleans, respectively, where m is the action dimension size, θ<sup>1</sup> ∈ R and θ<sup>2</sup> ∈ R <sup>n</sup>. We represent a state input to a programmatic controller as s = {x<sup>1</sup> : ν1, x<sup>2</sup> : ν2, . . . , xn} where n is the state dimension size and ν<sup>i</sup> = s[x<sup>i</sup> ] is the value of x<sup>i</sup> in s. As usual, the unbounded variables in X = [x1, x2, . . . , xn] are assumed to be input variables (i.e., state variables). C is a low-level affine controller that can be invoked by a programmatic controller where θ3, θ<sup>c</sup> ∈ R <sup>m</sup>, θ<sup>4</sup> ∈ R <sup>m</sup>·<sup>n</sup> are controller parameters. Notice that C can be as simple as some (learned) constants θc.

E ::= C | if B then C else E B ::= θ<sup>1</sup> + θ T <sup>2</sup> · X ≥ 0 C ::= θ<sup>3</sup> + θ<sup>4</sup> · X | θ<sup>c</sup>

Fig. 2: A context-free grammar for programmatic controllers.

The semantics of a programmatic controller in E is mostly standard and given by a function <sup>J</sup>EK(s), defined for each language construct. For example, <sup>J</sup>xiK(s) = <sup>s</sup>[x<sup>i</sup> ] reads the value of a variable x<sup>i</sup> in a state s. A controller may use an if-then-else branching construct. To avoid discontinuities for differentiability, we interpret its semantics in terms of a smooth approximation:

$$\left[\text{if } B \text{ then } C \text{ else } E\right](s) = \sigma(\left[B\right](s)) \cdot \left[\left[C\right](s) + \left(1 - \sigma(\left[B\right](s))\right) \cdot \left[E\right](s)\right] \tag{3}$$

where σ is the sigmoid function. Thus, any controller programmed in this grammar is a differentiable program. During execution, a programmatic controller invokes a set of low-level affine controllers under different environment conditions, according to the activation of the B conditions in the program.

Programmatic Reinforcement Learning. We use the programmatic reinforcement learning algorithm [38] to learn a programmatic controller. Compared with other programmatic reinforcement learning approaches [27,47], this algorithm stands out by jointly learning both program structures and program parameters. Empirical results show that learned programmatic controllers achieve comparable or even better reward performance than deep neural networks [38].

#### 4 Proof Space Optimization

The main challenge of using a verification procedure to guide controller synthesis is that verifiers are in general incomplete. When verification fails, it does not necessarily mean the system under verification has a true counterexample as the verifier may introduce states caused by over-approximation errors, commonly seen in reachability analysis. Even a well-trained controller may fail verification because of approximation errors. In our context, for soundness, reachability analysis of continuous or hybrid systems additionally takes environment states in between the time interval of a timestep into account. Both of these kinds of states cannot be observed by RL agents during training in the concrete state space, which renders the importance of controller optimization in the proof space of verification. In the following, Sec. 4.1 defines a verification procedure for environment models governed by programmatic controllers. Sec. 4.2 encodes verification feedback as a loss function of controller parameters over the verification proof space. Finally, Sec. 4.3 defines an optimization procedure that iteratively minimizes the loss function for correct-by-construction controller synthesis.

#### 4.1 Controller Verification

We formalize controller synthesis as a verification-based controller optimization problem. A synthesized controller π is certified by a formal verifier against an environment model M<sup>δ</sup> [·] and a rollout specification ψ (Definition 2). The verifier returns true if π can be verified correct.

Reinforcement learning algorithms typically discretize a continuous environment model M<sup>δ</sup> [·] to sample environment states every δ time period (as a timestep) for controller learning (Sec. 2). For soundness, in verification our approach instead considers all states reachable by the original continuous system. Formally, given a set of initial states S0, we use S<sup>i</sup> (i > 0) to represent the set of reachable concrete states during the time interval of [(i − 1)δ, iδ]:

$$S\_i = \{ \phi(s\_0, t) \mid \forall s\_0 \in S\_0, \forall t \in [(i - 1)\delta, \ i\delta] \}$$

where φ is the flow function for environment state transition defined in Sec. 2. Our algorithm uses abstract interpretation to soundly approximate the set of reachable states S<sup>i</sup> at each time step by reachability analysis.

Definition 3 (Symbolic Rollouts). Given an environment model M<sup>δ</sup> [π] = (S, A, F, R, π) deployed with a controller π, a set of initial states S0, and an abstract domain D, a symbolic rollout of M<sup>δ</sup> [π] over D is ζ <sup>D</sup> = S D 0 , S<sup>D</sup> 1 , . . . where S D <sup>0</sup> = α(S0) is the abstraction of the initial states S<sup>0</sup> in D. Each symbolic state S D <sup>i</sup> = F <sup>D</sup>[π] S D i−1 over-approximates S<sup>i</sup> - the set of reachable states from the initial state S<sup>0</sup> during the time interval [(i − 1)δ, iδ] of the timestep i. F <sup>D</sup> is an abstract transformer for M<sup>δ</sup> [π]'s state transition function F.

Our implementation of the abstract interpreter F <sup>D</sup> is based on Flow<sup>∗</sup> [10], a reachability analyzer for continuous or hybrid systems, where the abstract domain D is Taylor Model (TM) flowpipes. Formally, for reachability computation at each timestep i (where i > 0), we firstly use Flow<sup>∗</sup> to evaluate the TM flowpipe Sˆ <sup>i</sup>−<sup>1</sup> for the reachable set of states at time t = (i − 1)δ. To obtain a TM representation for the output set of the programmatic controller at timestep i, we use TM arithmetic to evaluate a TM flowpipe Aˆ <sup>i</sup>−<sup>1</sup> for <sup>J</sup>πK(s) for all states s ∈ Sˆ <sup>i</sup>−1. Here <sup>J</sup>π<sup>K</sup> encodes the semantics of <sup>π</sup> (Equation 3). For example, the semantics of the oscillator controller in Fig. 1a is:

$$\begin{aligned} &\sigma(28.33x\_1 + 4.23x\_2 + 4.16) \times (6.79x\_1 - 8.56x\_2 + 0.35) \\ &+ \ (1 - \sigma(28.33x\_1 + 4.23x\_2 + 4.16)) \times (11.01x\_1 - 13.50x\_2 + 8.71) \end{aligned}$$

where the sigmoid function σ can be handled by TM arithmetic. The resulting TM representation Aˆ <sup>i</sup>−<sup>1</sup> can be viewed as an overapproximation of the controller's output at timestep i. Finally, we use Flow<sup>∗</sup> to construct the TM flowpipe overapproximation S D i for all reachable states during the time period at timestep i by reachability analysis over the ODE dynamics of the transition function ˙x = f(x, a) for δ time period with initial state x(0) ∈ Sˆ <sup>i</sup>−<sup>1</sup> and the control action a ∈ Aˆ <sup>i</sup>−1.

Verification Procedure. Given a closed-loop system M<sup>δ</sup> [π], a time horizon T δ (T > 0), and a rollout specification <sup>ψ</sup> ::= <sup>J</sup>ϕ<sup>I</sup> reach <sup>ϕ</sup><sup>1</sup> ensuring <sup>ϕ</sup><sup>2</sup>K, we obtain the symbolic rollout of M<sup>δ</sup> [π] as ζ D 0:<sup>T</sup> = S D 0 , S<sup>D</sup> 1 , . . . , S<sup>D</sup> <sup>T</sup> where S D 0 is the abstraction of all states in ϕ<sup>I</sup> in the abstract domain D. For formal verification, we extend the semantics definition of the rollout specification <sup>J</sup>ψ<sup>K</sup> over concrete rollouts (Definition 2) to support symbolic rollouts. Formally, <sup>J</sup>ψK(<sup>ζ</sup> D 0:T ) holds iff:

$$\forall s \in \gamma(S\_T^{\mathcal{D}}).\ \varphi\_1(s)\ \bigwedge \ \forall\ 0 \le i \le T,\ s \in \gamma(S\_i^{\mathcal{D}}).\ \varphi\_2(s)\ \forall$$

where γ is the concretization function of the abstract domain D. The closedloop system M<sup>δ</sup> [π] satisfies ψ, denoted as M<sup>δ</sup> [π] <sup>|</sup><sup>=</sup> <sup>ψ</sup>, iff <sup>J</sup>ψK(<sup>ζ</sup> D 0:T ) holds. The abstract domain D is the proof space of controller verification.

Example 2 To verify the closed-loop system composed by the oscillator ODE in Eq. 1 and the learned controller in Fig. 1a, we have conducted reachability analysis to overapproximate the reachable state set during the time period of each timestep within the episode horizon. The result of the TM flowpipes are depicted as a sequence of green regions in Fig. 1b. The verification procedure cannot guarantee that the target be reached eventually due to the approximation errors.

#### 4.2 Correctness Property Loss in the Proof Space

To facilitate controller optimization in the presence of verification failures, our approach measures the amount of correctness property violation as verification feedback. To this end, we firstly define correct property violation over the concrete environment state space and then lift this definition to the proof space of controller verification.

We note that a controller rollout that fails correctness property verification violates desired properties at some states. The following definition characterizes a correctness loss function to quantify the correctness property violation of a state.

Definition 4 (State Correctness Loss Function). For a predicate ϕ over states s ∈ S, we define a non-negative loss function L(s, ϕ) such that L(s, ϕ) = 0 iff s satisfies ϕ, i.e. s |= ϕ. We define L(s, ϕ) recursively, based on the possible shapes of ϕ (Definition 1):

– L(s, A · x ≤ b) := max(A · s − b, 0) – L(s, ϕ<sup>1</sup> ∧ ϕ2) := max(L(s, ϕ1),L(s, ϕ2)) – L(s, ϕ<sup>1</sup> ∨ ϕ2) := min(L(s, ϕ1),L(s, ϕ2))

Notice that L(s, ϕ<sup>1</sup> ∧ ϕ2) = 0 iff L(s, ϕ1) = 0 and L(s, ϕ2) = 0, and similarly L(ϕ<sup>1</sup> ∨ ϕ2) = 0 iff L(ϕ1) = 0 or L(ϕ2) = 0.

Our objective is to use verification feedback to improve controller safety. To this end, we lift the correctness loss function over concrete states (Definition 4) to an abstract correctness loss function over abstract states.

Definition 5 (Abstract State Correctness Loss Function). Given an abstract state S <sup>D</sup> and a predicate ϕ, we define an abstract correctness loss function:

$$\mathcal{L}\_{\mathcal{D}}(S^{\mathcal{D}}, \varphi) = \max\_{s \in \gamma(S^{\mathcal{D}})} \mathcal{L}(s, \varphi).$$

where γ is the concretization function of the abstract domain D. The abstract correctness loss function applies γ to obtain all concrete states represented by an abstract state S <sup>D</sup>. It measures the worst-case correctness loss of ϕ among all concrete states subsumed by S <sup>D</sup>. Given an abstract domain D, we can usually approximate the concretization of an abstract state γ(S <sup>D</sup>) with a tight interval γ<sup>I</sup> (S <sup>D</sup>). As exemplified in Fig. 1b, it is straightforward to represent Taylor model flowpipes as intervals in Flow<sup>∗</sup> . Based on the possible shape of ϕ, we redefine LD(S <sup>D</sup>, ϕ) as:

– LD(S <sup>D</sup>, A · x ≤ b) := maxs∈γ<sup>I</sup> (SD) max(A · s − b, 0) – LD(S <sup>D</sup>, ϕ<sup>1</sup> ∧ ϕ2) := max(LD(S <sup>D</sup>, ϕ1),LD(S <sup>D</sup>, ϕ2)) – LD(S <sup>D</sup>, ϕ<sup>1</sup> ∨ ϕ2) := min(LD(S <sup>D</sup>, ϕ1),LD(S <sup>D</sup>, ϕ2))

Theorem 1 (Abstract State Correctness Loss Function Soundness). Given an abstract state S <sup>D</sup> and a predicate ϕ, we have:

$$\mathcal{L}\_{\mathcal{D}}(S^{\mathcal{D}}, \varphi) = 0 \implies \forall s \in \gamma\_I(S^{\mathcal{D}}) \; s \models \varphi.$$

We further lift the definition of the correctness loss function over abstract states (Definition 5) to a correctness loss function over symbolic rollouts.

Definition 6 (Symbolic Rollout Correctness Loss). Given a rollout specification ψ := ϕ<sup>I</sup> reach ϕ<sup>1</sup> ensuring ϕ<sup>2</sup> and a symbolic rollout ζ D 0:<sup>T</sup> = S D 0 , . . . , S<sup>D</sup> T where S D 0 is the abstraction of all states in ϕ<sup>I</sup> in the abstract domain D, we define an abstract safety loss function LD(ζ0:<sup>T</sup> , ψ) measuring the degree to which the rollout specification is violated:

LD(ζ0:<sup>T</sup> , ϕ<sup>I</sup> reach ϕ<sup>1</sup> ensuring ϕ2) = max(LD(S D T , ϕ1), max 0<i≤T (LD(S D i , ϕ2)))

Definition 6 enables a quantitative metric for the correctness loss of a controller in the verification proof space. Given a closed loop system M<sup>δ</sup> [π], a time horizon T δ, a rollout specification ψ, and the corresponding symbolic rollout ζ D 0:T of M<sup>δ</sup> [π], the correctness loss of M<sup>δ</sup> [π] with respect to ψ, denoted as LD(M<sup>δ</sup> [π], ψ), is defined over the symbolic rollout i.e. LD(M<sup>δ</sup> [π], ψ) = LD(ζ D 0:T , ψ).

Example 3 In Fig. 1b, there is a correctness loss (depicted as a red arrow) between the abstract state at the last timestep of the oscillator symbolic rollout and the desired reachable region ϕreach defined in Example 1. We characterize it as an abstract state correctness loss. The whole symbolic rollout has the same correctness loss with respect to the rollout specification defined in Example 1.

Theorem 2 (Symbolic Rollout Correctness Soundness). Given an environment M<sup>δ</sup> [·] deployed with a controller π and a rollout specification ψ, we have

$$\mathcal{L}\_{\mathcal{D}}(M^{\delta}[\pi], \psi) = 0 \implies M^{\delta}[\pi] = \psi.$$

Algorithm 1 VEL: Verification-based learning framework for controller synthesis. In line 8, ω<sup>k</sup> is a Gaussian noise and ν is a small positive real number.

Require: Environment model M<sup>δ</sup> [·], rollout specification ψ, initial controller π<sup>θ</sup> trained using the programmatic RL algorithm [38].

Ensure: Optimized controller π<sup>θ</sup> such that M<sup>δ</sup> [πθ] |= ψ.

```
1: procedure VEL
2: θ ← all parameters in πθ for optimization
3: while true do
4: `D ← LD(Mδ
                    [πθ], ψ)
5: if `D = 0 then
6: Dump πθ to a verified controller list
7: end if
8: ∇θLD ← 1
                 N
                   PN
                     k=1
                         LD(Mδ
                              [πθ+νωk
                                    ], ψ)−LD(Mδ
                                              [πθ−νωk
                                                    ],ψ)
                                       ν
                                                       ωk
9: θ ← θ − η · ∇θLD where η is a learning rate
10: end while
11: end procedure
```
#### 4.3 Controller Synthesis

The unique feature of our controller synthesis algorithm is that it leverages verification feedback on either true unsafe states or overapproximation errors introduced by verification to search for a provably correct controller.

Controller Synthesis in the Proof Space. We deem a programmatic controller π with trainable parameters θ (e.g. from the grammar in Fig. 2) as πθ. Given a closed-loop system M<sup>δ</sup> [πθ], the correctness loss function LD(M<sup>δ</sup> [πθ], ψ) is essentially a function of πθ's parameters θ. To reduce the correctness loss of π<sup>θ</sup> over the proof space D, we leverage a gradient-descent style optimization to update θ by taking steps proportional to the negative of the gradient of LD(M<sup>δ</sup> [πθ], ψ) at θ. As opposed to standard gradient descent optimization, we optimize π<sup>θ</sup> based on symbolic rollouts in the proof space D, favouring the abstract interpreter (i.e. Flow<sup>∗</sup> ) directly for verification-guided controller updates.

Black-box Gradient Estimation. Directly deriving the gradients of LD, however, requires the controller verification procedure be differentiable, which is not supported by reachability analyzers such as Flow<sup>∗</sup> . To overcome this challenge, our algorithm effectively estimates the gradients of L<sup>D</sup> based on random search [34]. Given a closed-loop environment M<sup>δ</sup> [πθ], at each training iteration, we obtain perturbed systems M<sup>δ</sup> [πθ+νω] and M<sup>δ</sup> [πθ−νω] where we add sampled Gaussian noise ω to the current controller πθ's parameters θ in both directions and ν is a small positive real number. By evaluating the abstract correctness losses of the symbolic rollouts of M<sup>δ</sup> [πθ+νω] and M<sup>δ</sup> [πθ−νω], we update θ with a finite difference approximation along an unbiased estimator of the gradient:

$$\nabla\_{\boldsymbol{\theta}}\mathcal{L}\_{\mathcal{D}} \leftarrow \frac{1}{N} \sum\_{k=1}^{N} \frac{\left(\mathcal{L}\_{\mathcal{D}}(M^{\delta}[\pi\_{\boldsymbol{\theta} + \nu\omega\_{k}}], \; \psi) - \mathcal{L}\_{\mathcal{D}}(M^{\delta}[\pi\_{\boldsymbol{\theta} - \nu\omega\_{k}}], \; \psi)\right)}{\nu}\omega\_{k}$$

We update controller parameters θ as follows where η is a learning rate:

$$
\theta \leftarrow \theta - \eta \cdot \nabla\_{\theta} \mathcal{L}\_{\mathcal{D}}
$$

Our high-level controller synthesis algorithm is depicted in Algorithm. 1. The algorithm takes as input an environment model M<sup>δ</sup> [·], a rollout specification ψ, and a programmatic controller π learned using the programmatic reinforcement learning technique [38]. When verification fails (line 4), it uses the correctness loss of the symbolic rollout of M<sup>δ</sup> [π] for optimization (line 8-9). The algorithm repeatedly performs the gradient-based update until a verified controller is synthesized. As the controller verification procedure is undecidable in general, it is possible that Algorithm 1 converges with a nonzero correctness loss. Our empirical results in Sec. 5 demonstrate that the algorithm works well in practice.

#### 5 Experimental Results

We have implemented the verification-guided controller synthesis technique in Algorithm 1 in a tool called VEL (VErification-based Learning) [50]. Given an environment and a rollout specification ψ (Definition 2), VEL uses the programmatic reinforcement learning algorithm [38] to learn a programmatic controller π. The controller π is trained to satisfy the safety and reachability requirements as set by ψ. We do so by shaping a reward function that is consistent with ψ this function rewards actions leading to goal states and penalizes actions leading to unsafe states. As the RL algorithm does not provide any correctness guarantees and the verification procedure may introduce large approximation errors, even well-trained controllers may fail verification. In case of verification failures, VEL applies Algorithm 1 to optimize π based on the verification feedback.

We evaluated VEL on several nonlinear continuous or hybrid systems taken from the literature. These are problems that are widely used for evaluating stateof-the-art verification tools for learning-enabled cyber-physical systems. Benchmarks B1 - B5 were introduced by [18]; adaptive cruise control (ACC) was presented in [43]; mountain car (MC) and quadrotor with model-predictive control (QMPC) were introduced by [28]; Pedulum and CartPole were taken from [29]; Tora and Unicyclecar were presented in the ARCH-COMP21 competition on formal verification of Artificial Intelligence and Neural Network Control Systems (AINNCS). We present the dynamics and the detailed description of each benchmark in [50]. The rollout specifications (Definition 2) are depicted in Table 1. The specifications define for each benchmark the initial states, the goal regions to reach, and the safety properties describing the safety boundary or the obstacles to avoid. On three benchmarks we verify the controller correctness over an infinite horizon. For the classic control problem Pendulum, to verify that the pendulum does not fall in an infinite time horizon, the rollout specification requires that any rollout starting from the region x1, x<sup>2</sup> ∈ [−0.1, 0.1] (representing pendulum angle and angular velocity) eventually turns back to it and any rollout states must be safe (including those that temporarily leave this region). Similarly, Tora models a moving cart attached to a wall with a spring.


Table 1: Benchmark Rollout Specifications (T represents True).

On Torainf, we prove that the controller for the arm of the cart connecting to the spring can stabilize the cart over an infinite horizon while maintain safety around the origin. On Oscillatorinf, we verify that the controller can stabilize the oscillator around a target region over an infinite horizon while the process of reaching the target region from the initial states is safe.

The experimental results are given in Table 2. VEL synthesized provably correct programmatic controllers for all the benchmarks. Table 2 shows the total time spent on each benchmark (T.T) as well as the verification time of the final controller (V.T). Half of the benchmarks can be directly verified with the initial programmatic controller (in Table 2, T.T for these benchmarks is empty as they only need one pass of verification in V.T). The other half must go through the verification-guided controller learning loop due to approximation errors in verification although these controllers achieved satisfactory test performance. We depict the learning performance of VEL on these benchmarks in Fig. 3 averaged over 5 random seeds. The results show that VEL can robustly and reliably reduce the correctness loss over symbolic rollouts (i.e. the verification feedback) to zero.

Table 2: Experiment Results. Depth shows the height of the abstract syntax tree of a programmatic controller. T.T shows the overall execution time of VEL including both the time for reachability analysis and verification-guided controller synthesis. V.T measures only the verification time for the final controller. If a controller can be verified directly without verification-guided optimization, the value of T.T is empty. The execution times for ReachNN<sup>∗</sup> and Verisig measure the cost of verifying a neural network controlled system (NNCS). The notation of the size (n × k) indicates a neural network (with sigmoid activations) with n hidden layers and k neurons per layer. If a property could not be verified, it is marked as Unknown. N/A means that the tool is not applicable to a benchmark.


Table 2 also shows the results of verifying the benchmarks as neural network controlled systems (NNCS) using two state-of-the-art verification tools ReachNN<sup>∗</sup> [18] and Verisig [28] where the controllers are trained as neural networks. We note that VEL is designed for programmatic controllers and uniquely has a verification-guided learning loop. Here our intention is not to compare the tools' performance. Instead, Table 2 demonstrates that integrating verification in training loops for programmatic controllers is more tractable than for neural network controllers. It shows that programmatic controller verification (column V.T) has a much lower computation cost compared to verifying neural network controllers using ReachNN<sup>∗</sup> and Verisig except for MountainCar<sup>2</sup> . When ReachNN<sup>∗</sup> and Verisig produces Unknown, the tools are not able to verify the rollout specification due to the large estimated approximation errors in verification. On Tora, ReachNN<sup>∗</sup> spent over 13000s to produce imprecise flowpipes with large approximation errors that cannot be used for verification. In this case, repeatedly conducting neural network controller verification in a learning loop is

<sup>2</sup> MountainCar is a hybrid system model. VEL is not yet optimized for hybrid system verification.

Fig. 3: Learning Performance of Verification-guided Controller Synthesis on B1, UnicycleCar, QMPC, Oscillator, ACC, and Torainf. The y-axis records the correctness loss of symbolic rollouts over abstract states. The results are averaged over 5 random seeds. VEL reliably reduces the symbolic rollout correctness loss to zero across the learning loop iterations (the x axis) for each benchmark.

computationally infeasible. On the other hand, VEL makes verification-guided controller synthesis feasible as evidenced in Table 2 and Fig. 3. It efficiently uses the programmatic controller verification feedback to reduce the correctness loss over the abstraction of controller reachable states to 0 in the verification proof space (even if the abstraction may introduce approximation errors).

#### 6 Related Work

Robust Machine Learning. Our work on using abstract interpretation [14] for controller synthesis is inspired by the recent advances in verifying neural network robustness, e.g. [23,5,40,51]. These approaches apply abstract interpretation to relax nonlinearity of activation functions in neural networks into convex representations, based on linear approximation [52,51,39,40,55] or interval approximation [26,35]. Since the abstractions are differentiable, neural networks can be optimized toward tighter concertized bounds to improve verified robustness [35,7,55,48,33]. Principally, abstract interpretation can be used to verify the reachability properties of nonlinear dynamics systems [30,37,4]. Recent work [43,28,41,18,17,29,13] has already achieved initial results about verifying neural network controlled autonomous systems by conducting reachability analysis. However, these approaches do not attempt to leverage verification feedback for controller synthesis within a learning loop partially because of the high computation demand of repeatedly verifying neural network controllers. VEL demonstrates the substantial benefits of using verification feedback in a proof space for learning correct-by-construction programmatic controllers. Related works [25,16] conduct trajectory planning from temporal logic specifications but do not provide formal correctness guarantees. Extending VEL to support richer logic specifications is left for future work.

Safe Reinforcement Learning. Safe reinforcement learning is a fundamental problem in machine learning [36,45]. Most safe RL algorithms form a constraint optimization problem by specifying safety constraints as cost functions in addition to reward functions [1,9,15,31,42,54,53]. Their goal is to train a controller that maximizes the accumulated reward and bound the aggregate safety violation under a threshold. However, aggregate safety costs do not support reachability constraints in the Safe RL context. In contrast, VEL ensures that a learned controller be formally verified correct and can better handle reachability constraints beyond safety. Model-based safe learning is combined with formal verification in [22] where an environment model is updated as learning progresses to take into account the deviations between the model and the actual system behavior. We leave combing VEL and model-based learning in future work.

Safe Shielding. The general idea of shielding is to use a backup controller to enforce the safety of a deep neural network controller [3]. The backup controller is less performant than the neural controller but is safe by construction using formal methods. The backup controller runs in tandem with the neural controller. Whenever the neural controller is about to leave the provably safe state space governed by the backup controller, the backup controller overrides the potentially unsafe neural actions to enforce the neural controller to stay within the certified safe space [2,11,21,22,24,56,6,32]. In contrast, VEL directly integrates formal verification into controller learning loops to ensure that learned controllers are correct-by-construction and hence eliminates the need for shielding.

### 7 Conclusion

We present VEL that bridges formal verification and synthesis for learning correct-by-construction programmatic controllers. VEL integrates formal verification into a controller learning loop to enable counterexample-guided controller optimization. VEL encodes verification feedback as a loss function of the parameters of a programmatic controller over the verification proof space. Its optimization procedure iteratively reduces both controller correctness violation by true counterexamples and overapproximation errors caused by abstraction. Our experiments demonstrate that controller updates based on verification feedback can lead to provably correct programmatic controllers. For future work, we plan to extend VEL to support controller safety during exploration in noisy environments. When a worst-case environment model is provided, this can be achieved by repeatedly leveraging the verification feedback on safety violation to project a controller back onto the verified safe space [12] after each reinforcement learning step taken on the parameter space of the controller.

Data-Availability Statement VEL is available at the repository [50]. The instructions for reproducing our experiment results are included in this repository.

Acknowledgments This work was supported in part by NSF CCF-2007799 and NSF CCF-2124155.

### References


Proceedings. Lecture Notes in Computer Science, vol. 6806, pp. 379–395. Springer (2011)


and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020. pp. 7166– 7172 (2020)


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Taming Large Bounds in Synthesis from Bounded-Liveness Specifications ?

Philippe Heim() and Rayna Dimitrova

CISPA Helmholtz Center for Information Security, Saarbr¨ucken, Germany {philippe.heim, dimitrova}@cispa.de

Abstract Automatic synthesis from temporal logic specifications is an attractive alternative to manual system design, due to its ability to generate correct-by-construction implementations from high-level specifications. Due to the high complexity of the synthesis problem, significant research efforts have been directed at developing practically efficient approaches for restricted specification language fragments. In this paper we focus on the Safety LTL fragment of Linear Temporal Logic (LTL) syntactically extended with bounded temporal operators. We propose a new synthesis approach with the primary motivation to solve efficiently the synthesis problem for specifications with bounded temporal operators, in particular those with large bounds. The experimental evaluation of our method shows that for this type of specifications it outperforms state-ofart synthesis tools, demonstrating that it is a promising approach to efficiently treating quantitative timing constraints in safety specifications.

### 1 Introduction

Reactive synthesis [8] has the goal of automatically generating an implementation from a formal specification that describes the desired behavior of a reactive system. The system requirements are typically specified using temporal logics such as Linear Temporal Logic (LTL). Temporal logics are expressive, high-level specification languages capable of describing rich properties, such as, for example, robotic missions [16]. Specifications of reactive systems often include requirements of the form "something good eventually happens". These can be expressed in LTL via the temporal operators U ("until") and ("eventually"). "Eventually" is an abstraction for the existence of some unknown time point in the future of a system execution when some property holds true. While this abstraction is useful for avoiding over-specification, there are many situations in which there are practical bounds on the time within which a requirement must be met. In such cases, it is vital that the synthesis procedure checks if the timing requirements are realizable, and synthesizes an implementation that adheres to these bounds.

As a simple example, consider a specification of the desired behavior of a controller for the front door of an office building. Our specification states that the

c The Author(s) 2023

<sup>?</sup> Philippe Heim carried out this work as PhD candidate at Saarland University, Germany.

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 251–269, 2023. https://doi.org/10.1007/978-3-031-30820-8 17

door must always be locked at night, and unlocked otherwise. It also stipulates that in the event of a fire the door should eventually open. Formulated like this, the specification is realizable. However, in case of a fire during night the synthesized implementation will only open the door at the start of the day. Clearly, this is not the behavior we intended! We can specify the actual desired behavior in LTL by using the temporal operator ("next"), which allows us to state that a property should hold at the next time step. However, we would need to use nested operators in order to express the required time bounds. This can quickly become inconvenient, especially if we need to specify various different time bounds, some of them large. This modeling inconvenience and the increase of specification size are easily avoided by adding bounded versions of the temporal operators as syntactic sugar, without increasing expressiveness.

Due to their practical significance, fragments of LTL in which the formulas (in negation normal form) include only bounded versions of the U and operators have attracted considerable attention. The most prominent such fragment is Safety LTL the until-free fragment of LTL in negated normal form. Since Safety LTL is a syntactic fragment of LTL, it can express bounded liveness properties only via nested next operators. Another notable example is the logic Extended Bounded Response LTL (LTLEBR) [9], which is a fragment of LTL that includes bounded temporal operators as well as unbounded universal temporal operators (i.e., "globally" and "release"). While every LTLEBR formula can be expressed in Safety LTL, one significant advantage of LTLEBR is that the bounds of the temporal operators are represented in binary, which allows for exponentially more succinct formulas. However, in the course of the synthesis procedure presented in [9] these bounds are expanded into nested "next" operators. Keeping bounds symbolic is identified in [9] as an interesting direction for future developments. Indeed, in many practically relevant cases large bounds are unavoidable due to requirements on the same system across different time-scales.

In this paper we address this challenge by proposing a synthesis procedure for an extension of Safety LTL with bounded operators. We develop dedicated techniques for handling the temporal bounds symbolically and efficiently.

Contribution. We propose a synthesis method for specifications expressed in a fragment of LTL which is a syntactic extension of Safety LTL with bounded temporal operators. The distinguishing characteristic of our method is a reduction to a dedicated game model, called countdown-timer games in which the temporal operators' bounds are treated symbolically via the introduction of timers. Further features of the translation are techniques for on-the-fly pruning of edges in the constructed game and reduction of the number of introduced timers. We present an abstraction-based method for solving the resulting games. We have developed a prototype implementation of our approach, and the experimental evaluation demonstrates that it is indeed capable of handling efficiently safety specifications with large bounds. We demonstrate that on a set of benchmarks featuring bounded temporal operators with large bounds, our technique outperforms state-of-the-art tools for LTLEBR and LTL synthesis.

Related Work. The synthesis problem for Safety LTL has attracted significant interest due to its algorithmic simplicity compared to general LTL synthesis [25]. For instance, the symbolic approach presented in [25] is shown to outperform the state-of-the-art LTL synthesis tools at the time. For LTLEBR, [9] proposes a synthesis algorithm based on a fully symbolic translation to deterministic safety automata. A key difference between our approach and the above techniques is that our countdown-timer game construction does not expand upfront the bounded temporal operators, but treats them symbolically instead. Furthermore, the authors of [25] point out that for large Safety LTL formulas the construction of the deterministic safety automaton presents a performance bottleneck. Our safety game constriction makes use of pruning in order to alleviate this problem by eliminating on-the-fly parts of the game graph that need not be explored.

Parameterized temporal logics, such as PLTL [1] enable the specification of parametric lower and upper bounds on the satisfaction time of the "globally" operator and the wait time of "eventually". In the logic prompt-LTL [17], only eventualities are parameterized by upper bounds. The bounds of the temporal operators in these logics are unknown parameters, while in the case that we consider, the bounds are given integer constants. The goal of our work is to develop a synthesis method that treats constant bounds efficiently.

In the real-time setting, temporal logics that allow for limiting the time scope of temporal operators have been extensively studied. Notable logics are Metric Temporal Logic (MTL) [15], and its fragment Metric Interval Temporal Logic (MITL) [2]. Compared to the untimed setting, synthesis from real-time logic specifications poses additional challenges. Controller synthesis is undecidable for MTL [4], for MITL [5,11], and even for the safety fragment of MTL [5]. Decidability is regained by fixing the resources (clocks and guards) of the controller [5,12]. The key challenge stems from the fact that synthesis requires deterministic automata, and it is not generally possible to construct deterministic timed automata for MITL. To circumvent this problem, the assumption of bounded variability is commonly made. Under this assumption, [20] proposes a synthesis algorithm for bounded response properties, and a translation from MTL to deterministic timed automata is presented in [23]. With respect to tool support, sound but incomplete synthesis methods for fragments of MTL have been proposed in [6] and [18], and implemented in toolchains that employ Uppaal-Tiga [3] for timed games solving. A tool for MTL controller synthesis via translation to alternating timed automata was presented in [14]. In the case when the real-time synthesis problem is given as a timed game and the specification is a state-based winning condition, the problem of computing a control strategy is decidable [21]. Efficient on-the-fly algorithms for timed games have been developed [7], and successfully implemented in Uppaal-Tiga [3] and Uppaal-Stratego [10].Since we are interested in discrete-time systems, we circumvent the additional challenges present in the dense-time setting by remaining the realm of discrete time and focusing on efficiently treating quantitative timing constraints there.

### 2 Preliminaries

Reactive Synthesis Let I be a finite set of uncontrollable environment input Boolean propositions and O be a finite set of controllable output Boolean propositions. A reactive system is a tuple (C, c0, γ) where C is a set of control states, c<sup>0</sup> ∈ C the initial control state, and γ : C × 2 <sup>I</sup> → C × 2 <sup>O</sup> is the transition function. A specification is a language L ⊆ 2 I∪O<sup>ω</sup> of infinite words over I ∪O.

A system (C, c0, γ) realizes a specification L if for all infinite sequences of environment inputs i ∈ 2 I ω it yields an output sequence o ∈ 2 O ω defined by (ct+1, ot) = γ(ct, it) for t ∈ N, such that i ∪ o ∈ L. Reactive synthesis is the problem of finding a realizing implementation for a given specification.

Safety LTL with Bounded Liveness Operators We consider specifications expressed using temporal logic, more concretely, in a fragment of LTL [24], which we denote by SafeLTLB. The fragment SafeLTL<sup>B</sup> is a syntactic extension of Safety LTL [25] and defined by the following grammar:

$$\varphi, \psi := ap \mid \neg ap \mid \varphi \land \psi \mid \varphi \lor \psi \mid \mathsf{O}[n]\varphi \mid \mathsf{O}[n]\varphi \mid \varphi \mathcal{W}[n]\psi \mid \varphi \mathcal{W}\psi$$

for ap ∈ I ∪O and n ∈ N. SafeLTL<sup>B</sup> extends Safety LTL by bounded operators with bounds encoded in binary. While all bounded operators have equivalent Safety LTL formulas (e.g. [n]ϕ ≡ W i∈{0...n} <sup>i</sup> ϕ) these have exponentially larger encoding. The constants > (true), ⊥ (false), the "globally" operator and "bounded until" U[n] can be derived as > := a∨¬a, ⊥ := a∧¬a, ϕ := ϕ W ⊥, [n]ϕ := ϕ W[n]⊥, and ϕ U[n]ψ := (ϕ W[n]ψ) ∧ [n]ψ, respectively.

The satisfaction of a formula Φ ∈ SafeLTL<sup>B</sup> by infinite word w = w0w<sup>1</sup> . . . ∈ 2 I∪O<sup>ω</sup> at time point k ∈ N is denoted as w <sup>k</sup> Φ and is defined follows:

w <sup>k</sup> a :⇔ a ∈ w<sup>k</sup> w <sup>k</sup> ¬a :⇔ a 6∈ w<sup>k</sup> w <sup>k</sup> ϕ ∧ ψ :⇔ (w <sup>k</sup> ϕ) ∧ (w <sup>k</sup> ψ) w <sup>k</sup> ϕ ∨ ψ :⇔ (w <sup>k</sup> ϕ) ∨ (w <sup>k</sup> ψ) w <sup>k</sup> [n]ϕ :⇔ ∃i ≤ n. w k+<sup>i</sup> ϕ w <sup>k</sup> [n]ϕ :⇔ w k+<sup>n</sup> ϕ w <sup>k</sup> ϕ W[n]ψ :⇔ (∀i ≤ n.w k+<sup>i</sup> ϕ) ∨ (∃j ≤ n.w k+<sup>j</sup> ψ ∧ ∀i < j.w k+<sup>i</sup> ϕ) w <sup>k</sup> ϕ W ψ :⇔ (∀i.w k+<sup>i</sup> ϕ) ∨ (∃j.w k+<sup>j</sup> ψ ∧ ∀i < j.w k+<sup>i</sup> ϕ).

The language of Φ ∈ SafeLTL<sup>B</sup> is defined as L(Φ) := {w ∈ 2 I∪O<sup>ω</sup> | w <sup>0</sup> Φ}. Two-Player Safety Games The synthesis problem for temporal logic specifications can be solved by translating the specification into a two-player game between the system and the environment, and then solving the game to determine the winning player. If the system wins, an implementation can be extracted.

A game structure is a tuple G = (S, S0, I, O, ρ), where S is a set of states, S<sup>0</sup> ⊆ S is a set of initial states, I and O are sets of propositions as defined earlier, and ρ : S × 2 <sup>I</sup> × 2 <sup>O</sup> → S is a transition function. A game on G is played by two players, the system and the environment. In a given state s ∈ S, the environment chooses some input i ⊆ I, then the system chooses some output o ⊆ O, and these choices determine the next state s 0 := ρ(s, i, o). The game then continues from s 0 . The resulting infinite sequence π = s0, s1, s2, . . . of states is called a play. Formally, a play is a sequence π = s0, s1, s2, . . . ∈ S <sup>ω</sup> such that s<sup>0</sup> ∈ S<sup>0</sup> and for every t ∈ N, st+1 = ρ(st, i, o). A system strategy is a function σ : S <sup>+</sup> × 2 <sup>I</sup> → 2 <sup>O</sup>. An environment strategy is a function π : S <sup>+</sup> → 2 I . Given a state s ∈ S, a system strategy σ and an environment strategy π, we denote with Outcome(s, π, σ) the unique play s0, s1, s2, . . . such that s<sup>0</sup> = s, and for all k ∈ N, sk+1 = ρ(sk, ik, σ((s0, s<sup>1</sup> . . . , sk), ik)), where i<sup>k</sup> = π((s0, s<sup>1</sup> . . . , sk)).

A safety game is a tuple (G, UNSAFE) where UNSAFE ⊆ S are unsafe states. The system wins the safety game if it has a strategy σ such that for all environment strategies π, s<sup>0</sup> ∈ S0, k ∈ N, it holds that Outcome(s0, π, σ)<sup>k</sup> 6∈ UNSAFE. Such strategy is called a winning strategy for the system. Intuitively, the system has to avoid the unsafe states no matter what the environment does. The environment wins if it can enforce a visit to UNSAFE, i.e., when there exist environment strategy π and s<sup>0</sup> ∈ S<sup>0</sup> such that for every system strategy σ there exists k ∈ N such that Outcome(s0, π, σ)<sup>k</sup> ∈ UNSAFE.

### 3 SafeLTL<sup>B</sup> Synthesis with Countdown-Timer Games

SafeLTL<sup>B</sup> Synthesis We consider the realizability and synthesis problems for the fragment SafeLTLB. We focus on the challenge of handling efficiently specifications with large bounds in the bounded temporal operators, and propose a new synthesis method towards achieving this goal. The proposed approach proceeds in two stages. In the first stage, the given SafeLTL<sup>B</sup> formula is transformed into a kind of safety game, in which bounds are treated symbolically. We term these games countdown-timer games, introduced later in this section. The second stage of our synthesis algorithm is the solving of the generated countdown-timer game in order to determine the winning player and answer the realizability question. We propose in Section 5 a method that employs symbolic representation and approximations in order to efficiently solve such games in practice.

Countdown-Timer Games Intuitively, countdown-timer games are like safety games but with additional countdown-timers. Countdown-timers are discrete timers that always start with an assigned duration and are decremented by one with every transition in the game. Once a timer reaches zero it times out, and the transition relation of the countdown-timer game may depend on this information for determining the successor state. A countdown-timer can be reset to the duration associated with it. In addition, countdown-timers with the same duration can swap their values, which we will later use when generating timergames to avoid unnecessary blowup in the number of timers.

Definition 1 (Countdown-Timer Games). A countdown-timer game structure is a tuple G<sup>T</sup> = (T , d, L, L0, I, O, δ) where T is a finite set of countdown timers, d : T → N associates a duration with each timer, L is a finite set of game locations, L<sup>0</sup> ⊆ L is the set of initial locations, I, O are finite sets of uncontrollable environment input propositions and controllable system propositions, respectively, and δ : L × 2 <sup>I</sup> × 2 <sup>O</sup> × 2 <sup>T</sup> → L × E is the transition relation. E := T → (T ∪ {RESET}) is the set of effects where for all e ∈ E: 1. for all t ∈ T either e(t) = RESET , or e(t) ∈ T and d(e(t)) = d(t) and,

2. for t1, t<sup>2</sup> ∈ T with t<sup>1</sup> 6= t<sup>2</sup> we have e(t1) 6= e(t2) or e(t1) = e(t2) = RESET . A countdown-timer game is a pair (G<sup>T</sup> , UNSAFE <sup>L</sup>) where UNSAFE <sup>L</sup> ⊆ L is a set of unsafe locations.

The effects E capture the resets and remapping of timers that can occur upon transitions. Condition (1) states that each timer is either reset or remapped to a timer with the same duration. Condition (2) requires the remapping to be injective, i.e. no two timers are mapped to the same timer. When timers are not reset and not remapped to other timers, they are simply mapped to themselves.

The semantics of a countdown-timer game is the safety game generated by explicitly expanding the possible valuations of the timers. Intuitively, each state of the game structure is a pair s = (l, v) of a location l ∈ L and a timer valuation v. Initially, each timer t is set to its associated duration d(t). The transition relation updates the values of the timers by first decrementing them and then applying the effect e of the corresponding transition in G<sup>T</sup> . The relevant transition in G<sup>T</sup> is determined by the location l, the input and output sets i and o, and the set of timers whose value has become 0 after the decrementation.

Definition 2 (Countdown-Timer Games Semantics). In the context of Definition 1, let V := {v : T → N | ∀t ∈ T . v(t) ≤ d(t)} be the space of all possible timer valuations. Let G = (L × V, L<sup>0</sup> × {λt.d(t)}, I, O, ρ) be a game structure where ρ((l, v), i, o) := trans(l, step(v), i, o) with

$$\begin{aligned} step(v) &:= \lambda t. \max\{0, v(t) - 1\} \\ trans(l, v, i, o) &:= \begin{cases} \left(l', \lambda t. \begin{cases} v(e(t)) & if \, e(t) \in \mathcal{T} \\ d(t) & if \, e(t) = RES \, T \end{cases} \right), \\ where \, (l', e) &:= \delta(l, i, o, \{t \in \mathcal{T} \mid v(t) = 0\}). \end{aligned}$$

The semantics of the countdown-timer game (G<sup>T</sup> , UNSAFE <sup>L</sup>) is the safety game (G, UNSAFE<sup>L</sup> × V). The system (environment) wins the countdown-timer game if and only if it wins the safety game representing its semantics.

### 4 Countdown-Timer Game Construction

We now present the first phase of our synthesis algorithm, namely the translation of a SafeLTL<sup>B</sup> formula to a countdown-timer game. Our construction is based on expansion rules. For example, the formula [50]a is equivalent to a∨ [49]a. If a is true, then the whole formula is true. Otherwise, in the next step [49]a has to hold. Interpreted as a state of a safety game, [50]a has a transition to > on a = > and to [49]a on a = ⊥. This can be repeated on [49]a and so on. Once we reach [0]a we expand it to a ∨ ⊥, and hence, a = ⊥ leads to ⊥ which is the unsafe state. This construction works for safety formulas, as rejection can be decided with a finite prefix. As we show later, generating a game structure in this way has the advantage that it can be pruned using information from the formula.

However, this explicit expansion yields a sequence of formulas that is linear in the bound, and hence, exponential in the description of the formula. Instead of explicit bounds, we use countdown-timers representing multiple values. In the above example, we do not generate all the expansions [50]a, . . . , [0]a, but instead a timer t with duration 51 to represent all expansions from 50 to 0 in the single location a ∨ [t]a. If t times out, [t] has reached the end of the expansion and is transformed to ⊥. Hence, instead of having [50]a, . . . , [0]a, > and ⊥ as states of a safety game we only have locations a ∨ [t]a, > and ⊥ in a countdown-timer game. We now describe this construction formally.

### 4.1 Construction of a Countdown-Timer Game from SafeLTL<sup>B</sup>

The locations of the generated countdown-timer games are SafeLTL<sup>B</sup> formulas with, additionally, timers as bounds of the temporal operators. We denote the set of these formulas as SafeLTL<sup>t</sup> <sup>B</sup>. Given a set of timers T , the grammar of SafeLTL<sup>t</sup> <sup>B</sup> is the grammar of SafeLTL<sup>B</sup> but in [n], [n], and W[n] we have n ∈ N∪T . For ϕ ∈ SafeLTL<sup>t</sup> <sup>B</sup>, Timers(ϕ) ⊆ T denotes all timers appearing in ϕ. Game Structure Let Φ be a SafeLTL<sup>B</sup> formula over input propositions I and output propositions O. We construct a countdown-timer game structure (T , d, L, L0, I, O, δ) as follows. The set of timers

$$\mathcal{T} := \{ t\_i^d \mid \mathsf{O}[d], \mathsf{O}[d-1], \text{ or } \mathcal{W}[d-1] \text{ occurs in } \Phi, 0 \le i \le d \} $$

consists of timers t d <sup>i</sup> with index i and durations d(t d i ) := d for 0 ≤ i ≤ d. The duration of a timer determines the bounds of the temporal operators in Φ for which it can be used, and the indices are used for distinguishing multiple timers of the same duration (introduced at different points of the expansion).

Let L := PositiveBooleanCombinations(cl(Φ)) (i.e., built from cl(Φ) using ∧, ∨) be the set of locations, where cl is the closure operator defined as:

$$\begin{array}{lcl} cl(l) &:= \{l, \top, \perp\} & l \in \{ap, \neg ap\} \\ cl(\varphi \ o \; \psi) &:= cl(\varphi) \cup cl(\psi) & o \in \{\wedge, \vee\} \\ cl(\mathsf{Q}[n]\varphi) &:= cl(\varphi) \cup \{\mathsf{Q}[t\_i^n]\varphi \mid 0 \le i \le n\} \\ cl(\mathsf{Q}[n]\varphi) &:= cl(\varphi) \cup \{\mathsf{Q}[t\_i^{n+1}]\varphi \mid 0 \le i \le n+1\} \\ cl(\varphi \ \mathcal{W}[n]\psi) &:= cl(\varphi) \cup cl(\psi) \cup \{\varphi \ \mathcal{W}[t\_i^{n+1}]\psi \mid 0 \le i \le n+1\} \\ cl(\varphi \ \mathcal{W}\psi) &:= cl(\varphi) \cup cl(\psi) \cup \{\varphi \ \mathcal{W}\psi\}. \end{array}$$

Intuitively, the closure contains all possible temporal-operator sub-formulas and literals that can appear during expansion. The locations L then represent the expanded formulas, which, intuitively, correspond to the current obligations of the system. Thus, the initial location will correspond to obligation Φ. Note that L ⊆ SafeLTL<sup>t</sup> <sup>B</sup>. We apply simplifications to the generated formulas to ensure that L is finite. Since by definition cl(Φ) is finite, we can ensure that |L| ≤ 2 |cl(Φ)| .

In the construction of the initial location and the transition function we use two helper functions, introExp : SafeLTL<sup>t</sup> <sup>B</sup> → SafeLTL<sup>t</sup> <sup>B</sup>, which performs expansion and introduces new timers, and opt : SafeLTL<sup>t</sup> <sup>B</sup> → L, which performs simplifications that ensure that L is finite. We let L<sup>0</sup> := {opt(introExp(Φ))} and

$$\delta(\varphi, i, o, T) := (opt(introExp(\psi)), e) \text{ where } (e, \psi) := squareze(to (T, tree(\varphi, i, o))).$$

Here, we use the additional functions tree : SafeLTL<sup>t</sup> <sup>B</sup> × 2 <sup>I</sup> × 2 <sup>O</sup> → SafeLTL<sup>t</sup> B, which performs the input and outputs choices, to : 2<sup>T</sup> ×SafeLTL<sup>t</sup> <sup>B</sup> → SafeLTL<sup>t</sup> B, which handles time-outs, and squeeze : SafeLTL<sup>t</sup> <sup>B</sup> → E ×SafeLTL<sup>t</sup> B, which determines remapping and reset of timers. Below, we describe these functions in detail.

Remark: Note that for [b] we use timers of duration b, while for [b] and W[b] we use timers of duration b+ 1. The reason for this is that for the latter we consider the last step as part of the timing as this simplifies the game structure.

Before describing the functions, we illustrate them on a simple example.

Example 1. Let I = {r}, O = {g}, and consider the SafeLTL<sup>B</sup> formula Φ = ( [100]¬g) ∧ [10](r → [100]g). Φ states that the system should not give a grant during the first 100 steps, and, if at step 10 there is a request, then a grant should be given within the following 100 steps. We show how to construct the initial location and some of the transitions in a countdown-timer game for Φ. Initial state ϕ<sup>0</sup> = opt(introExp(Φ))

The initial state is computed from Φ by expanding the formula and introducing any necessary timers. This is done by the function introExp. The subformula [100]¬g expands to ¬g ∧ [t 101 0 ]¬g, reflecting the semantics of the operator [100]. This introduces the timer t 101 <sup>0</sup> with duration 101 and index 0. The subformula [10](r → [100]g) expands to [t 10 0 ](r → [100]g), which introduces the timer t 10 0 for [10]. The durations 101 and 10 of the timers correspond to the respective bounds in [100] and [10], and the index 0 is the smallest index of a currently unused timer of the respective duration. No timer is introduced at this step for [100] as it is guarded by a operator. Thus, the initial state is the expanded formula ϕ<sup>0</sup> = ¬g ∧ ( [t 101 0 ]¬g) ∧ [t 10 0 ](r → [100]g). Determining transition δ(ϕ0, ∅, {g}, ∅) = (ϕ1, e1)

We apply tree(ϕ0, ∅, {g}) which computes the effect of the input ∅ and output {g} on the formula in the current step, and thus substitutes g with > in ϕ0. This results in tree(ϕ0, ∅, {g}) = ⊥, meaning that this transition leads to location ⊥. Determining transition δ(ϕ0, ∅, ∅, {t 10 }) = (ϕ2, e2)

0 Again, we first compute tree(ϕ0, ∅, ∅) = ( [t 101 0 ]¬g) ∧ [t 10 0 ](r → [100]g), which now substitutes ⊥ for g. To the result we apply the function to that handles time-outs, here {t 10 <sup>0</sup> }, which means that the timer t 10 0 times out at the current step. As a result, the subformula [t 10 0 ](r → [100]g) is replaced by r → [100]g, meaning that the formula r → [100]g becomes part of the obligation at the next step, since the timer t 10 <sup>0</sup> has run out. Thus, we obtain to({t 10 <sup>0</sup> },( [t 101 0 ]¬g)∧ [t 10 0 ](r → [100]g)) = ( [t 101 0 ]¬g)∧(r → [100]g). After that, we apply function squeeze that takes care of timers that might have become unused upon time-out. This is reflected in the effect e<sup>2</sup> that resets all timers that do not appear in the current formula. Thus, in e<sup>2</sup> the timer t 10 0 that just timed out is mapped to RESET, and the timer t 101 0 that is still present is mapped to itself. The final step is to apply function introExp that performs expansion on the current formula and introduces any new timers that might be needed. The subformula [t 101 0 ]¬g expands to ¬g ∧ [t 101 0 ]¬g. The subformula r → [100]g expands to r → (g ∨ [t 101 1 ]g), which introduces the timer t 101 1 for [100]. Note that since the formula already contains the timer t 101 <sup>0</sup> of duration 101, the newly introduced timer t 101 <sup>1</sup> has index 1. The functions to and squeeze ensure that the order between the indices of timers of the same duration represents the order in which these timers will time out. After computing introExp(( [t 101 0 ]¬g) ∧ (r → [100]g)) we obtain ϕ<sup>2</sup> = ¬g ∧ ( [t 101 0 ]¬g) ∧ (r → (g ∨ [t 101 1 ]g)).

Construction We construct the sets of locations, timers, and transitions, by exploring the reachable parts of L from L0. We describe several pruning mechanisms that we use in order to maintain the set of reachable locations small.

Construction Invariants. To ensure correctness and keep the game generation efficient, we maintain the following invariants for each reachable location:


Invariant (1) is needed for correctness, and for ensuring that all literals that are relevant in the current step are considered, and that all relevant bounded operators are tracked by timers. Invariant (2) ensures that we never need more than the available d timers. This holds since the timers are strictly ordered when running, and once we would introduce t d <sup>d</sup>+1, t d <sup>0</sup> would have timed out. Furthermore, ordering the timers reduces the possible combinations of time-outs. Invariant (3) prevents having unused timers that are between used ones according to the above order, thus reducing the possible combinations of equivalent locations.

Function tree: Selection of Inputs and Outputs. The function tree(ϕ, i, o) computes the effect of the input i and output o on the formula in the current step. With invariant (1) it suffices to consider literals on the Boolean top-level, i.e. literals that are not sub-formulas of a temporal operator. When assigning the literals in ϕ according to i and o, we prune and select some "obvious choices" which can immediately be decided, using the fact that we are generating a game. This pruning is an important part of our approach, as in practice it can prune a significant portion of the possible locations. Function tree applies recursively a set of rules. We now describe these rules in the order in which they are applied in each recursion step. Figure 1 provides a formal description.


$$tree(c \lor \psi, i, o) := \mathbb{I}[c \in o] \tag{1}$$

$$tree(u \wedge \psi, i, o) := \bot \tag{2}$$

$$tree(\psi, i, o) := \begin{cases} tree(\psi[c/\top]\_T) & \text{if } c \in o\\ \bot & \text{if } c \not\subseteq o \end{cases} \qquad c \in ActL(\psi), \neg c \notin ActL(\psi) \tag{3}$$

$$tree(\psi, i, o) := \stackrel{\cdot}{tree}(\psi[u/\perp]\_T) \qquad \qquad u \in ActL(\psi), \neg u \notin ActL(\psi) \tag{4}$$

tree(ψ, i, o) := <sup>ψ</sup>[u/J<sup>u</sup> <sup>∈</sup> <sup>i</sup>K]<sup>T</sup> u, <sup>¬</sup><sup>u</sup> <sup>∈</sup> ActL(ψ) (5)

$$tree(\psi, i, o) := \psi[c/\mathbb{I}[c \in o]]\_T \tag{6}$$

Figure 1: Let u ∈ I and c ∈ O. For simplicity of the presentation we leave out the commutative and associative cases and negative literals. ActL(ψ) denotes the set of literals appearing in the Boolean top-level of ψ. The formula ψ[ap/v]<sup>T</sup> is obtained from ψ by replacing ap by v ∈ {>, ⊥} for all occurrences of ap at the Boolean top-level, but only there. After each replacement we simplify the formula by doing constant folding. <sup>J</sup><sup>x</sup> <sup>∈</sup> <sup>X</sup><sup>K</sup> is <sup>&</sup>gt; if <sup>x</sup> <sup>∈</sup> <sup>X</sup> and <sup>⊥</sup> if <sup>x</sup> 6∈ <sup>X</sup>.


Function to: Handling Time-out. A consequence of invariant (2) is that only timers with index 0, i.e., of the form t d 0 , can time out since the timers are ordered. In addition, timers that do not appear inside a formula should not time out (this is enforced by squeeze) as we show later. Note that this does not apply to timers with duration 1 as these time out immediately. We direct impossible time-outs to > since they do not occur. Hence, to(T, ϕ) := > if for some t d <sup>i</sup> ∈ T we have that i 6= 0, or d > 1 and t d i 6∈ Timers(ϕ). Otherwise, to(T, ϕ) is defined by applying the following transformations on all subformulas of ϕ and timing out timers t ∈ T: We transform [t]ψ ⊥, [t]ψ ψ, and φ W[t]ψ >. After applying to we do constant folding as parts of the formula may become irrelevant.

Function squeeze: Determining remapping and reset of timers. When applying the functions tree and to some timers might become unused. Hence, we have to ensure that invariant (3) holds and, as stated in the previous paragraph, reset all timers that do not appear in the formula. We define squeeze(ϕ) := (e, ψ) as follows: For each duration d, let t d ij ∈ Timers(ϕ) with indices i<sup>0</sup> < i<sup>1</sup> < i<sup>2</sup> < . . . be the remaining timers with sorted indices i<sup>j</sup> . Then set e(t d j ) := t d ij if i<sup>j</sup> exists and e(t d j ) := RESET otherwise. ψ is obtained by replacing the timers t d ij by t d j . Function introExp: Expansion and Timer Introduction. The function introExp performs the formula expansion and introduces new timers if necessary. The expansion guarantees that invariant (1) holds afterwards. When introducing new timers, invariant (2) and invariant (3) have also to be maintained. This is achieved by assigning for each bound b with associated duration d, the timer with the next unused index, i.e. t d j 6∈ Timers(ϕ) where t d 0 , . . . , t<sup>0</sup> <sup>j</sup>−<sup>1</sup> ∈ Timers(ϕ). Let I(d) := max{i | t d <sup>i</sup> ∈ Timers(ϕ)} + 1 be the next unused index. In addition, as timers t d <sup>i</sup> with i > d do not exist by invariant (2), expansions generating them are redirected to >. Hence, we define introExp(ϕ) := rd(iE<sup>I</sup> (ϕ)) where rd(ϕ) := > if for some i > d we have t d <sup>i</sup> ∈ Timers(ϕ), and rd(ϕ) = ϕ otherwise. The function iE<sup>I</sup> performing the expansion is defined by

$$\begin{array}{lcl} iE\_{I}(l) &:= l & iE\_{I}(\varphi \circ \psi) &:= iE\_{I}(\varphi) \circ iE\_{I}(\psi) \\ iE\_{I}(\mathsf{Q}[n]\varphi) &:= iE\_{I}(\varphi) \vee \mathsf{Q}[t\_{I(n+1)}^{n+1}]\varphi & iE\_{I}(\mathsf{Q}[t]\varphi) &:= iE\_{I}(\varphi) \vee \mathsf{Q}[t]\varphi \\ iE\_{I}(\mathsf{Q}[n]\varphi) &:= \mathsf{Q}[t\_{I(n)}^{n}]\varphi & iE\_{I}(\mathsf{Q}[t]\varphi) &:= \mathsf{Q}[t]\varphi \\ iE\_{I}(\varphi \, \mathcal{W}[n]\psi) &:= iE\_{I}(\psi) \vee iE\_{I}(\varphi) \wedge & iE\_{I}(\varphi \, \mathcal{W}[t]\psi) &:= iE\_{I}(\psi) \vee iE\_{I}(\varphi) \\ & (\varphi \, \mathcal{W}[t\_{I(n+1)}^{n+1}]\psi) & & \wedge (\varphi \, \mathcal{W}[t]\psi) \\ iE\_{I}(\varphi \, \mathcal{W}\psi) &:= iE\_{I}(\psi) \vee iE\_{I}(\varphi) \wedge (\varphi \, \mathcal{W}\psi) \end{array}$$

where l ∈ {ap, ¬ap}, o ∈ {∧, ∨}, n ∈ N and t ∈ T .

Function opt: Formula Simplification. The function opt ensures that the constructed set of locations L is finite, by simplifying the formulas in order to avoid introducing infinitely many logically equivalent formulas. Since we must maintain the invariants, the simplification does not guarantee uniqueness modulo equivalence. Nevertheless, it ensures finiteness of L and performs optimizations.

Definition of UNSAFE and Correctness To complete the construction of the countdown-timer game, we define the set of unsafe locations as UNSAFE <sup>L</sup> = {⊥}. The proof of the correctness theorem below is given in the full version [13].

Theorem 1. Let Φ ∈ SafeLTL<sup>B</sup> and G be the countdown-timer game structure constructed from Φ as described above. Then there exists a system realizing L(Φ) if and only if the system wins in the countdown-timer game (G, UNSAFE <sup>L</sup>).

We augment the construction with several extensions to improve its efficiency and expand its scope. For instance, we combine explicit expansion with timerbased implicit expansion, which allows us to handle directly operators like single . We also use approximation to handle simple assumptions of the form ψ where ψ is fully bounded, i.e., without W. Details can be found in the full version [13].

#### 5 Solving Countdown-Timer Games

We now describe the second phase of our synthesis algorithm, namely the solving of the countdown-timer game generated from the SafeLTL<sup>B</sup> specification. In a countdown-timer game, the durations of the timers, which correspond to the bounds of the temporal operators in the specification, are encoded in binary. Hence, the set V of timer valuations and thus also the safety game defined in Section 3 grow exponentially in the size of the countdown-timer game. Since our goal is to efficiently solve countdown-timer games with large durations, explicitly constructing and solving the semantic safety game is not desired. We note, however, that in the worst case it is not possible to avoid this blowup. This is stated in the next theorem, the proof of which is given in the full version [13].

#### Theorem 2. Solving countdown-timer games is EXPTIME-complete.

This means that solving countdown-timer games efficiently requires an approach that manipulates sets of timer valuations symbolically, in order to avoid, if possible, explicit enumeration. We propose a symbolic algorithm for solving countdown-timer games that additionally employs an iteratively refined approximation. The method is applicable to generic symbolic representations of the set of timer valuations. We present an instantiation of the method with a representation composed of intervals of timer values and partial orders on timers.

Symbolic Game Solving The standard way to solve a safety game is to compute the set of states from which the environment can enforce reaching an unsafe state, and check if it intersects with the set of initial states. If this is the case, then the environment wins the game, and otherwise the system wins.

For a game (G, UNSAFE) with G = (S, S0, I, O, ρ), the set of states from which the environment can enforce reaching UNSAFE is called environment attractor and is defined as AttrE <sup>G</sup>(UNSAFE) = {s ∈ S | ∃π : env. strategy.∀σ : sys. strategy.∃k ∈ N. Outcome(s, π, σ)<sup>k</sup> ∈ UNSAFE}. The environment wins the safety game if and only if AttrE <sup>G</sup>(UNSAFE) ∩ S<sup>0</sup> 6= ∅.

We solve the countdown-timer game by computing a symbolic representation of the attractor of the environment player to the unsafe locations. We assume a symbolic representation Rep of the space of timer valuations 2<sup>V</sup> . For each <sup>R</sup> <sup>∈</sup> Rep we denote with <sup>J</sup>R<sup>K</sup> ⊆ V the subset of <sup>V</sup> represented by <sup>R</sup>. We represent subsets of the state space L×V of the semantic safety game using functions from <sup>L</sup> <sup>→</sup> Rep where <sup>U</sup> <sup>∈</sup> (<sup>L</sup> <sup>→</sup> Rep) represents {(l, v) <sup>|</sup> <sup>v</sup> <sup>∈</sup> <sup>J</sup>U(l)K}.

The symbolic enforceable predecessor for the environment CPreEsymb : (L → Rep) → (L → Rep) is defined as follows. For U ∈ (L → Rep), we let

$$CPreE\_{symb}(U) := \lambda l \bigcup\_{i \subseteq \mathcal{T}} \bigcap\_{o \subseteq \mathcal{O}} \bigcup\_{T \subseteq \mathcal{T}} \operatorname{symTrans}(\delta(l, i, o, T), T, U), \text{ where}$$

$$\text{symTrans}((l',e),T,U) := \text{inc}(\text{effTO}(T,\text{remap}(e,\text{effReset}(e,U(l')))))$$

is the symbolic backward application of transition δ(l, i, o, T) to the target set <sup>J</sup>U(<sup>l</sup> 0 )K. The operations that symTrans requires, from last to first, are as follows.


We also require that we can preform set operations ∪, ∩, and equality checking between elements of Rep, in order to perform the computation.

We employ the symbolic enforceable predecessor operator CPreEsymb to compute a symbolic representation of the environment attractor AttrEsymb as follows. We set AttrE<sup>0</sup> symb := (λl. if l ∈ UNSAFE <sup>L</sup> then V esle ∅), and then for n ∈ N we let AttrE <sup>n</sup>+1 symb := AttrE <sup>n</sup> symb ∪ CPreEsymb(AttrE <sup>n</sup> symb).

Proposition 1. If (G<sup>T</sup> , UNSAFE <sup>L</sup>) is a countdown-timer game with G<sup>T</sup> = (T , d, L, L0, I, O, δ) and the safety game (G, UNSAFE<sup>L</sup> × V) with G = (L × V, L<sup>0</sup> × {λt.d(t)}, I, O, ρ) is its semantics, then for the symbolic attractor computed above it holds <sup>J</sup>AttrEsymb(l)<sup>K</sup> <sup>=</sup> {<sup>v</sup> ∈ V | (l, v) <sup>∈</sup> AttrE <sup>G</sup>} for every <sup>l</sup> <sup>∈</sup> <sup>L</sup>.

Approximation of Timer Valuations As the symbolically represented statespace described above might still lead to exploring a large number of sets, we perform an over- and under-approximation of the attractor of explored states.

We use a threshold k ∈ N to control the precision of the abstraction. Intuitively, when approximating for t ∈ T we would like to treat exactly timer values at the "border", i.e. timer values in [0, k] and [d(t) − k, d(t)], since these matter for timeouts and resets. Our approximations over : Rep → Rep and under : Rep → Rep treat the intermediate values [k, d(t) − k] like a single valueblock. The over-approximation over (R) adds all intermediate values if one value from R is inside [k, d(t) − k] and the under-approximation under (R) removes all intermediate values if one value from R is not inside. Formally:

$$\begin{aligned} ≈\operatorname{approx}\_k(t,I) := \left(I \cap [k,d(t)-k] \neq \emptyset\right) \wedge \left([k,d(t)-k] \not\subseteq I\right) \\ &\left[\operatorname{over}(R)\right] &:= \left\{\lambda t. \begin{cases} v(t) \cup [k,d(t)-k] & \text{if } approx\_k(t,v(t)) \\ v(t) & \text{otherwise} \end{cases} \; \middle|\; v \in \left[R\right] \right\} \\ &\left[\lambda t. \begin{cases} v(t) \; \mid \; [k,d(t)-k] & \text{if } approx\_k(t,v(t)) \\ v(t) & \text{otherwise} \end{cases} \; \middle|\; v \in \left[R\right] \right\} \end{aligned}$$

The attractor computation is now done as follows: We start with k := 1. For the current k we compute the environment attractor once using under- and once using over-approximation at each symbolic state in the computation. If the environment wins in the under-approximation, it wins the concrete game. If the system wins in the over-approximation, it wins the concrete game. If neither holds, we set k := 2 · k and repeat. This always terminates since for k > d(t)/2 the approximations become exact, and hence, one player wins for sure.

Example 2. Consider a countdown-timer game, some transitions of which are depicted in Fig. 2a. From the depicted transitions, only the transition from l<sup>2</sup> to

⊥ has a non-empty time-out set, {t 1000 <sup>0</sup> }. Since the timer t 1000 <sup>0</sup> has duration 1000, computing AttrEsymb for the locations l<sup>1</sup> and l<sup>2</sup> precisely would require 1000 iterations. Employing over-approximation with threshold k = 3, on the other hand, reaches a fixed point in 7 iterations, as shown in Fig. 2b. This is helpful in cases like the one in the game in Fig. 2a, where the choice of transition in location l<sup>0</sup> is controlled by the system (via the output o). Here, the overapproximation allows the solving algorithm to quickly determine that the choice of transition to l<sup>1</sup> is loosing, while the system can win via the alternative transition.

Symbolic Representation using Boxes As a symbolic domain we chose an interval representation augmented with partial orders over timers Rep := 2 PartialOrder(T )×2 Rec where Rec := { i ∈ (T → N × N) | ∀t ∈ T ,(a, b) = i(t).0 ≤ a ≤ b ≤ d(t)} are the intervals in the form of a hyper-cube. Intuitively, we have a set of partial-orders and for each of them we have a set of hyper-cubes. Formally:

$$\mathbb{I}\left[R\right] := \bigcup\_{\left(p,C\right)\in R} \left( \left\{ v \in \mathcal{V} \mid \forall \left(t\_1 \sim t\_2\right) \in p : v\left(t\_1\right) \sim v\left(t\_2\right) \right\} \cap \bigcup\_{r \in C} \lambda t. \left[r\left(t\right)\_1, r\left(t\right)\_2\right] \right)$$

where r(t)<sup>i</sup> is the i-th projection of r(t). It remains to define the necessary operations: inc, effReset, effTO, and remap are mostly straightforward according to their definition, as they can be performed by modifying and inspecting all intervals individually or just reordering timers. Additionally, effReset uses the partial order to derive bounds on timers that are in relation with a timer that is reset. effTO refines the partial order, since on time-out T, all timers in T are smaller than T \T. Also the approximations can be performed point-wise on the intervals, as an approximate interval is again an interval.

We chose this domain since it is simple, and, at the same time, due to the use of partial orders, well suited for the type of problem we are solving. Our solving algorithm is generic and can accommodate other, more sophisticated domains.

### 6 Evaluation

We implemented<sup>1</sup> and evaluated our approach. We compare our prototype implementation to ebr-ltl-synth introduced in [9] which performs synthesis for LTLEBR. We also compare to the state-of-the-art LTL synthesis tool strix version 21.0.0 [19, 22]. In the following, we present the benchmarks we used, the experiments, and the results. We ran all experiments on an Intel Core i7-1165G7 processor with 16GB RAM and a single core available. All times are wall-clock times. A detailed description of the benchmarks is given in the full version [13].

Bounded Response Benchmarks In our first set of experiments we evaluate the tools on LTLEBR formulas from [9], and on 23 SYNTCOMP 2021 benchmarks<sup>2</sup> that fall into LTLEBR and are used for a similar comparison in [9]. Figure 3 and

<sup>1</sup> Available at: https://github.com/phheim/lisynt

<sup>2</sup> https://github.com/SYNTCOMP/benchmarks

Figure 3: Execution times in milliseconds on the benchmarks [9]. Figure 4: Execution times in milliseconds on the LTLEBR SYNTCOMP benchmarks.

Figure 4 show the runtimes with a time-out of one minute, respectively. Unfortunately, for roughly half of the benchmarks from [9] strix did not accept the input formula for being too long, since the bounded operators must be expanded explicitly upon input. We therefore left strix out for this comparison. Figure 3 shows that on the benchmarks from [9] both our implementation and ebr-ltl-synth have roughly the same runtime, ignoring different startup times. Figure 4 shows that on the selected SYNTCOMP benchmarks all three tools are comparable.

These experiments evaluate our implementation on relevant benchmarks that are partially not designed in the spirit of the problem that our approach targets. The results show that our implementation is comparable to existing tools.

Adaption of Real-Time Benchmarks In our second set of experiments, we took MTL synthesis problems from [14] and adapted them to SafeLTL<sup>B</sup> formulas. The


Table 1: Results on the office-robot and adapted real-time benchmarks. |L| and |T | are the numbers of locations and timers in the generated countdown-timer game. τGen is the runtime of the game generation in seconds. k is the approximation threshold on which the solving terminated. Win. shows whether the system (S) or the environment (E) wins. τ<sup>Σ</sup> is the total runtime including the game generation and solving, where TO means a time-out after 15 minutes. τstrix is the runtime of strix. For some benchmarks strix rejects the input for being too long (F) which is due to expanding the bounded operators when using strix.

benchmarks include a conveyor belt (conv-belt), a robot camera (robo-cam), and several parametrized instances of a multiple railroad-crossings controller (rail). We discretized the real-time bounds. The benchmarks use up to 19 propositions and 16 bounded operators, and bounds between 60 and 4000. Detailed results can be found in Table 1. ebr-ltl-synth was not applicable to these benchmarks as we had to use assumptions (which cannot be captured by the specifications in the LTLEBR fragment) to model the timed environment.

These experiments show that SafeLTL<sup>B</sup> can express interesting requirements from the real-time domain by appropriate discretization. We did not compare directly to the tool in [14], as the underlying modeling formalism is different, and hence we adapted the benchmarks. However, a superficial comparison of our results to those in [14] shows that our tool compares well (and is in some cases better). Furthermore, on these benchmarks our tool clearly outperforms strix.

Office Robot Benchmarks Our last set of experiments considers benchmarks we created ourselves. They consists of a number of specifications describing tasks for a robot in an office building with four rooms. The benchmarks are parametrized by the number of rooms that have to be serviced. They use up to 11 propositions and 14 bounded temporal operators, and bounds between 10 and 21600. Detailed results can be found in Table 1. ebr-ltl-synth was either not applicable due to use of assumptions (4 benchmarks) or timed out (25 benchmarks).

The results show that SafeLTL<sup>B</sup> can express meaningful synthesis tasks, and that our approach is viable for solving them. Furthermore, they show that our method indeed fulfills its purpose: for specifications requiring large bounds in the temporal operators our method clearly outperforms the state-of-the-art tools.

Overall Analysis Table 1 shows that the countdown-timer game generation is very efficient compared to the solving. As we expect to be able to improve the solving by more sophisticated symbolic techniques, we expect the countdown-timer game based approach to be viable for even more complex properties. In most cases the solver terminated with a low approximation threshold, which shows the usefulness of approximation. In our experience, without approximation solving the benchmarks with large bounds becomes infeasible with our current technique.

### 7 Conclusion

We presented a new synthesis approach for specifications expressed in an extension of Safety LTL with bounded temporal operators. A distinguishing feature of our method is that it is specifically targeted at efficiently solving the synthesis problem for specifications with bounded temporal operators, in particular those with large bounds. Our evaluation results show that our technique performs very well on a range of benchmarks featuring such timing requirements. The key to this success is a novel translation to a safety game with symbolically represented bounds, whose efficiency is due to the use of effective pruning techniques. We observe that our method for solving the generated game is viable, as shown by the evaluation. However, it has potential for further improvement by employing more performant symbolic representations and abstraction techniques.

#### Data-Availability Statement

The datasets generated during and/or analysed during the current study are available in the Zenodo repository,

https://doi.org/10.5281/zenodo.7505914.

### References


ification, 19th International Conference, CAV 2007, Berlin, Germany, July 3- 7, 2007, Proceedings. Lecture Notes in Computer Science, vol. 4590, pp. 95– 107. Springer (2007). https://doi.org/10.1007/978-3-540-73368-3\_12, https: //doi.org/10.1007/978-3-540-73368-3\_12


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Lockstep Composition for Unbalanced Loops

Ameer Hamza and Grigory Fedyukovich()

Florida State University, Tallahassee, FL, USA, ahamza@fsu.edu, grigory@cs.fsu.edu

Abstract. Equivalence checking of two programs is often reduced to the safety verifcation of a so-called product program that aligns the programs in lockstep. However, this strategy is not applicable when programs have arbitrary loop structures, e.g., the numbers of loops vary. We introduce an automatic iterative abstraction-refnement-based technique for checking equivalence of a single-loop program and a program which has a series of consecutive loops. Our approach decomposes the single loop into a sequence of separate loops thus reducing the main problem to a series of equivalence-checking problems for pairs of loops. Since due to the decomposition, these problems become abstract, our approach iteratively refnes the decomposed loops and lifts useful information across them. Our second contribution is a procedure for the alignment of loops with counters and explicit bounds that cannot be composed in lockstep. We have implemented the approach and successfully evaluated it on two suites, one with benchmarks containing diferent numbers of loops and the other containing benchmarks that need alignment.

### 1 Introduction

To gain performance benefts, optimizing compilers perform program transformations such as loop peeling, loop unrolling, and loop unswitching. The reliance on many transformations lowers the trust in the computation and motivates us to use automated SMT-based verifcation to verify equivalence of the program before and after the transformation. Specifcally, one should prove that for any equal inputs to both programs, their outputs are equal too. The problem is often reduced to construction of a product program by aligning (or merging) the instructions in lockstep and then determining if the product program meets a safety specifcation represented by the original relational specifcation. While effective for many pairs of programs that are relatively close to each other, this strategy may be insufcient for pairs of loopy programs with arbitrary control fow. We target the verifcation of pairs of programs in which the source program has a single loop, and the target program has a sequence of non-nested loops. Such programs have been extensively studied in the literature [4,23,31] but still are challenging for automated reasoning.

Before proving equivalence, our approach decomposes the loop in the source program into multiple loops such that the structure of this new program exactly matches the one in the target program. With two structurally similar programs

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. https://doi.org/10.1007/978-3-031-30820-8 18 270–288, 2023.

at hand, our approach targets pairs of loops and creates a lockstep composition for each pair. This lets us break our equivalence checking problem into smaller isolated problems, and if each such problem is successfully solved, then the given programs are indeed equivalent. An obvious downside of decomposition is the loss of context: if a program property is defned before the frst loop, it may not be available for the second and later loops. For that reason, we have to refne the decomposition by extracting the requested properties in the previously considered pairs of loops and pulling them to the currently-considered loops. Technically, this process is driven by counterexamples.

Moreover, when attempting to create a lockstep composition for loops that have diferent numbers of iterations, we might need to align them. When our method can compute an exact number of iterations of both the source and the target, it rearranges the control fow in the source by grouping the iterations in the loop, and extracting selected iterations to either before the loop or after. Such rearranging helps with programs where the number of iterations of one loop is a multiple of other, or is of by few iterations, which is common for optimizations including loop vectorization and loop peeling.

We implemented our equivalence checking algorithm, along with the algorithms to refne and align the loops, in a tool called Alien. On many commonly used public benchmarks [23], Alien is an order of magnitude faster than the most recent (to our knowledge) state-of-the-art tool Counter [14]. Alien can prove equivalence of pairs of user-written programs and it is not bound to any particular compiler unlike many related tools based on translation validation.

We proceed with an overview of the related work in Sect. 2 and a motivating example in Sect. 3. Then, we formally introduce our problem in Sect. 4. The main ingredients of our algorithm are then discussed in Sect. 5, and in Sect. 6. The evaluation is reported in Sect. 7, and conclusion in Sect. 8.

#### 2 Related Work

Relational verifcation aims at analyzing two diferent programs or two executions of the same program. This research feld has been extensively studied, but since it reduces to safety verifcation, it is known to be undecidable in general. Relational verifcation has applications in checking program equivalence, information-fow leakage, incremental verifcation, etc. To reduce to safety, it is a common practice to convert the programs into a product. The product can be used for relational verifcation tasks by providing appropriate relational precondition and postcondition. This research trend is pioneered by Barthe et al. [3] who used product programs in Hoare-style proving. More recently, there has been a rise of automated product construction techniques. e.g., [7, 16, 25, 26].

Creating product program requires that the two programs can be composed in some way, which is usually assumed to be trivial (e.g., lockstep), or provided to the verifer in some form. However, it is not always possible to get the trivial composition. The technique presented by Strichman et al. [36] extends the work of Godlin et al. [12] and it attempts to prove equivalence of two recursive functions having diferent base-cases and no lockstep composition, by creating an alignment between them. However, the alignment is done using unrolling factors, which are manually provided by the user, for both programs. The technique presented in [34] targets self-composition. It computes a scheduler for an asynchronous execution of both programs using counterexamples and a selection of predicates (e.g., from the user). A more recent work [38] is also a scheduler-driven but mainly targets mutual termination rather than full functional equivalence.

Translation validation techniques, [9, 17, 20, 22, 27, 28, 32, 35, 39], relate the source programs with their compiler outputs to check equivalence. However, it is usually the case that the compiler provides the manner of composition. Many data-driven techniques for proving equivalence, like [5,33], rely on fnding a trace alignment between concrete executions of the programs. Such techniques might perform inefciently when sufcient number of execution traces are not available. They might also require a lot of time for the data runs. The work in [22] performs bounded translation validation at the level of LLVM intermediate representation. The technique looks for a subset of behaviors of the source program in the target to infer equivalence. As the technique is bounded, it may not be sound.

The work by Gupta et al. [14] presents a counterexample-guided algorithm for translation validation of given programs. It explores the space of potential products to fnd a bisimulation relation between intermediate program locations of the two programs. and prove it via the generation of strong enough inductive invariants. Again, while making the approach fexible, reliance on counterexamples makes it slower, and as we will see from our evaluation (Sect. 7), this approach does not scale well in the cases an alignment needs larger unrollings.

Many techniques use relational verifcation for regression verifcation, where two versions of a program are compared for equivalence checking [1, 2, 11, 13, 15, 19, 24, 30, 36, 37]. Such techniques usually assume that two programs are closely related, hence the analysis is usually reduced by either pruning out or abstracting common parts of the programs. Many techniques simplify the process of equivalence checking. Some assume a static relationship between the number of iterations of two loops, in order to prove equivalence [6, 11, 21, 29, 33]. Other techniques create fnite unrollings of loops and prove equivalence until a certain bound, e.g., [1,18,22,30]. Our work makes an attempt to relax such assumptions.

### 3 Illustration on Example

Fig. 1 gives two C programs, the source program contains a single loop and the optimized target programs contains two sequential loops. Our approach aims at proving the equivalence of the source and the target, that is, if variables are initially given equal values (b = d, M = X, K = Y), then their values at the end are equal toothen outputs are equal too to, i.e., a = c, b = d. A lockstep composition on the programs in Fig. 1 is challenging to construct: 1) it is difcult to compare one loop with two sequential loops, and 2) there are diferent numbers of iterations taken by programs.

Our method decomposes the source loop into two loops to make it easier to create a product program. It creates two copies of the loop in the source with the same

```
1 int M = nondet () , K = nondet () ,
2 a = 0, N = 2*M +1+K, b = 2*M +1;
3 assume (M >= 0 && K >= 0);
4 while (a != N) {
5 b = (a >= b) ? b + 1 : b;
6 a ++;
7 }
                                     1 int X = nondet () , Y = nondet () ,
                                     2 c = 1 , d = 2*X +1;
                                     3 assume (X >= 0 && Y >= 0);
                                     4 while (c < 2*X +1) c +=2;
                                     5 while (c != 2*X +1+ Y) {
                                     6 d ++;
                                     7 c ++;
                                     8 }
```
Fig. 1: Source (left) and target (right) programs.


Fig. 2: Decomposed (left) and refned (right) source programs.

loop body but diferent loop guards, shown in Fig 2 (left). Specifcally, it uses the loop guard for the frst loop in the target program, i.e. c < 2\*X+1, to create a < 2\*M+1 and add it to the guard of the frst source loop. It then checks the equivalence of pairs of loops from the decomposed source and the target. However, the frst pair of loops (lines 4-7 in the decomposed source, line 4 in the target) is not in lockstep, as for each iteration of the target, the source is expected to iterate twice. Thus, we attempt to construct a lockstep composition by grouping two iterations of the frst loop in the decomposed source. However, this results in some residual iterations to be processed before the loop in the decomposed source. After conducting an analysis on the initial states of both loops and the body of the source loop, our approach moves one iteration to before the loop in the source. This is sufcient to complete the lockstep composition and prove that the frst pair of loops are equivalent.

Similarly, the approach considers the second pair of loops (lines 8-11 in the decomposed source, lines 5-8 in the target). To prove that the loops are in lockstep and for equivalence we are missing the information that N = 2\*M+1+K and b = 2\*M+1, which is available at the beginning of the program, but not in the middle of it. We say that these equalities refne the composition of the second loops, and they are added as an assumption before the start of the second loop (the refned source program is given in Fig. 2 (right)). The refnement makes it possible to both create the lockstep composition and prove the equivalence of both pairs of loops. The analysis terminates with the verdict that both programs are equivalent.

#### 4 Preliminaries

We follow the Satisfability Modulo Theories (SMT) background and notation to present the contributions. The goal of SMT is either to fnd an assignment to variables of a frst-order logic formula that makes it true (written |= , where is a model, and is a formula), or prove its non-existence (also called unsatisfability, denoted =⇒ ⊥). For formulas , , if every model of satisfes , we say that is logically stronger than (written =⇒ ). We write ite for an if-then-else.

### 4.1 Constrained Horn Clauses

Throughout the paper, we use the notion of Constrained Horn Clauses (CHCs) as a mean to represent the programs containing arbitrary number of loops.

Defnition 1. A Constrained Horn Clause C over a set of uninterpreted relation symbols is a (universally quantifed, implicitly) formula in frst-order logic that has the form of one of the three implications (namely a fact, an inductive clause and a query, respectively):

$$\phi(V\_1) \implies L\_1(V\_1) \qquad \qquad L\_1(V\_1) \land \dots \land L\_n(V\_n) \land \psi(V\_1, \dots, V\_{n+1}) \implies L\_{n+1}(V\_{n+1})$$

$$L\_1(V\_1) \land \dots \land L\_k(V\_k) \land \pi(V\_1, \dots, V\_k) \implies \bot$$

where for all , ∈ are uninterpreted predicate symbols, V are implicitly quantifed vectors of variables, and some and might be the same. All formulas , , are fully interpreted.

Throughout, we assume that each single loop is represented by two CHCs, e.g.:

$$\begin{array}{rcl} Int(V) \implies L(V) & \qquad\qquad L(V) \land GTr(V, V') \implies L(V') \end{array}$$

where, Init represents the initial state of the loop, GTr (V ,V ′ ) represents one iteration of the loop, which we call a guarded transition. For convenience, we split GTr (V ,V ′ ) to Tr (V ,V ′ ) ∧ (V ), where encodes a guard over the variables at the beginning of transition, and Tr has no additional guard.

Defnition 2. Given a set of uninterpreted predicates and a set of CHCs over , we say that is satisfable if there exists an interpretation for every ∈ that makes all implications in valid.

Solutions for CHC systems are called inductive invariants. If a CHC system is unsatisfable, there exists a counterexample showing a bad state is reachable.

### 4.2 Relational Verifcation

The problems of equivalence checking and lockstep composability are the instances of a more general problem of relational verifcation. In this section, we introduce it in a simple case for two systems containing a single loop each.

Defnition 3. Given two single-loop CHC systems over {1,2} ∈ with initial states Init{1,2} and guarded transition bodies GTr {1,2}, resp., a relational precondition pre and a relational postcondition post, the problem of relational verifcation can be formulated as the satisfability of the following CHC system:

$$\begin{aligned} Int1\_1(V) &\Longrightarrow L\_1(V,V) & & Int2\_2(V) &\Longrightarrow L\_2(V,V) \\ L\_1(V\_0,V) \land GrTr\_1(V,V') & & L\_1(V\_0,V') & & GrTr\_2(V,V') & \Longrightarrow L\_2(V\_0,V') \\ & pre(V\_0,W\_0) \land L\_1(V\_0,V) & & L\_2(W\_0,W) & \neg post(V,W) & \Longrightarrow \bot \end{aligned}$$

Here, both loop systems are augmented with an additional variable (at the frst argument of {1,2}) to keep track of the initial values of variables.

To solve the problem, formulated as a complex nonlinear CHC, we need to fnd individual invariants for both loops, which is difcult [7,25]. Instead, we aim at simplifying the problem for certain classes of programs. Specifcally, it often can be reduced to safety verifcation via so-called lockstep composition.

Defnition 4 (Lockstep-composability). Given two single-loop CHC systems and a relational precondition pre, a lockstep composition exists if 1) the following CHC system is satisfable:

$$\begin{aligned} pre(V\_1, V\_2) \land Int\_1(V\_1) \land Int\_2(V\_2) &\implies L\_{1,2}(V\_1, V\_2) \\ L\_{1,2}(V\_1, V\_2) \land GTr\_1(V\_1, V'\_1) \land GTr\_2(V\_2, V'\_2) &\implies L\_{1,2}(V'\_1, V'\_2) \\ L\_{1,2}(V\_1, V\_2) \land G\_1(V\_1) \neq G\_2(V\_2) &\implies \bot \end{aligned}$$

where 1,<sup>2</sup> ∈ is an uninterpreted predicate symbol, an interpretation of which corresponds to a relational invariant, and <sup>1</sup> and <sup>2</sup> represent the loop guards and 2) the body of the frst CHC is satisfable.

Intuitively, the frst CHC constrains the values of input variables to be related through pre (and also, pre should be consistent with both Init-s.). The second CHC encodes a synchronous computation of both loops. The third CHC ensures that inside the product loop both <sup>1</sup> and <sup>2</sup> should be true, and outside the loop both <sup>1</sup> and <sup>2</sup> should be false. This implies that the numbers of steps in two lockstep-composable programs under some pre are the same.

The following lemma lets us reduce a relational verifcation problem to a safety verifcation problem computed after merging the loops and then use existing invariant generation techniques for solving relational verifcation problems. Note that due to the lockstep, both loop guards are always equal, so it is enough to conjoin the negation of only one of the loop guards to the query.

Lemma 1. Given a relational verifcation problem over two systems over {1,2} ∈ representing single loops, pre, and post , if the systems are lockstep-composable under pre, and the following CHC problem is satisfable, then post holds at the end of these loops.

$$\begin{aligned} pre(V\_1, V\_2) \land Int\_1(V\_1) \land Int\_2(V\_2) &\implies L\_{1,2}(V\_1, V\_2) \\ L\_{1,2}(V\_1, V\_2) \land GTr\_1(V\_1, V'\_1) \land GTr\_2(V\_2, V'\_2) &\implies L\_{1,2}(V'\_1, V'\_2) \\ L\_{1,2}(V\_1, V\_2) \land \neg G\_1(V\_1) \land \neg post(V\_1, V\_2) &\implies \bot \end{aligned}$$

The problem of proving program equivalence is a special case of the relational verifcation problem where pre = post is a pairwise equality over V<sup>1</sup> and V2.

### 5 Equivalence Checking for Unbalanced Loops

In this section, we present our novel equivalence checking algorithm designed for the cases when the source and the target programs have diferent structures. We frst describe a class of the input CHC systems that we target in Sect. 5.1. We then provide a procedure to decompose the source such that we can break the problem of equivalence checking under our limitations into a sequence of smaller problems in Sect. 5.2. We then fnalize our core abstraction-refnement schema for equivalence checker in Sect. 5.3.

#### 5.1 Input Limitations and Auxiliary Defnitions

We support pairs of programs where the source contains a single loop, and the target possibly contains an arbitrary number of sequential loops. A CHC system of the latter sort that has loops is called a fat -sequence of loops further in the paper. Here and throughout, we assume that and encode the loop guard for the source loop and the th loop in the target, and that Tr and Tr encode respective loop bodies without the corresponding guards. Specifcally, the shape of a source program that we consider is defned over a single predicate symbol , and we thus refer to this system as -system later in the text:

$$\operatorname{Init}\_S(V\_S) \implies S(V\_S) \qquad \qquad S(V\_S) \land G\_S(V\_S) \land \operatorname{Tr}\_S(V\_S, V'\_S) \implies S(V'\_S)$$

The fat -sequence is defned over predicate symbols 1,. . . ,, and is referred to as -system in the paper:

$$\begin{aligned} Int\_T(V\_T) &\implies T\_1(V\_T) \quad T\_1(V\_T) \land G\_1(V\_T) \land Tr\_1(V\_T, V'\_T) \implies T\_1(V'\_T) \\\ T\_1(V\_T) \land \neg G\_1(V\_T) &\implies T\_2(V\_T) \quad T\_2(V\_T) \land G\_2(V\_T) \land Tr\_2(V\_T, V'\_T) &\implies T\_2(V'\_T) \\\ \dots & \dots & \dots \\\ T\_{n-1}(V\_T) \land \neg G\_{n-1}(V\_T) &\implies T\_n(V\_T) \quad T\_n(V\_T) \land G\_n(V\_T) \land Tr\_n(V\_T, V'\_T) &\implies T\_n(V'\_T) \end{aligned}$$

There is one fact CHC, in which Init represents the initial state of the program. There are inductive clauses, i.e., for each ∈ [1, ], the th inductive clause has occurrence of symbol on both sides of the implication. There are also − 1 non-inductive clauses that encode transitions between adjacent loops, so represents the condition when loop exits.

Example 1. The source in Fig. 1 is encoded to CHCs as follows:

$$a = 0 \land N = 2 \ast M + 1 + K \land b = 2 \ast M + 1 \land M \ge 0 \land K \ge 0 \implies S(a, b, M, K, N)$$

$$S(a, b, M, K, N) \land a \ne N \land a' = a + 1 \land b' = i \land (a \ge b, b + 1, b) \implies S(a', b', M, K, N)$$

Example 2. The CHC encoding of the target program in Fig 1 is given as:

$$\begin{aligned} c = 1 \land d = 2 \ast X + 1 \land X \ge 0 \land Y \ge 0 \implies T\_1(c, d, X, Y) \\ T\_1(c, d, X, Y) \land c < 2 \ast X + 1 \land c' = c + 2 \implies T\_1(c', d, X, Y) \\ T\_1(c, d, X, Y) \land c \ge 2 \ast X + 1 \implies T\_2(c, d, X, Y) \\ T\_2(c, d, X, Y) \land c \ne 2 \ast X + 1 + Y \land c' = c + 1 \land d' = d + 1 \implies T\_2(c', d', X, Y) \end{aligned}$$

We introduce a concept needed for the presentation in the next section, where by [/], we denote expression with all instances of replaced by :

Defnition 5. Given a CHC system over predicate symbols 1, . . . , , an -projection of (denoted |) is defned as {[⊤/ (·)] | ∈ , ̸= }.

That is, our projection replaces all applications of all predicate symbols except of by true. Clearly, some CHCs then can be simplifed to true, and we assume that they are removed from the projection.

Example 3. Let be a -system from Example 2, then |<sup>2</sup> has two CHCs:

$$\begin{aligned} c \ge 2 \ast X + 1 \implies T\_2(c, d, X, Y) \\ T\_2(c, d, X, Y) \land c \ne 2 \ast X + 1 + Y \land c' = c + 1 \land d' = d + 1 \implies T\_2(c', d', X, Y) \end{aligned}$$

#### 5.2 Equivalence Checking by Decomposition

Our main insight on checking equivalence of a source loop and a fat -sequence is that if the source breaks into distinct loop-chunks, and if each of these chunks is equivalent to the corresponding loop from the -sequence, then the actual programs are equivalent too. We thus present a decomposition of the source into a sequence of new loops that gives us the basis for comparing the two CHC systems. A decomposition of -system into an -fat sequence is done by:


$$\begin{aligned} Int\_S(V\_S) &\implies S\_1(V\_S) \\ S\_1(V\_S) \land G\_S(V\_S) \land P\_1(V\_S) \land Tr\_S(V\_S, V'\_S) &\implies S\_1(V'\_S) \\ S\_1(V\_S) \land \neg(G\_S(V\_S) \land P\_1(V\_S)) &\implies S\_2(V\_S) \\ &\dots \\ S\_n(V\_S) \land G\_S(V\_S) \land Tr\_S(V\_S, V'\_S) &\implies S\_n(V'\_S) \end{aligned}$$

For any interpretation of 1, . . . , −1, the CHC system constructed above is equivalent to the -system, for the following three reasons. First, no matter how many iterations the frst − 1 loops conduct, all the remaining ones will be conducted in the last loop. Second, all loops still use the original guard , and if it is exceeded in some th loop, then all the remaining +1th, . . . , th loops will be just skipped. Lastly, all these loops perform exactly the same operations as the original loop since Tr is copied to all of them. We will instantiate all the predicates on demand in our CounterExample Guided Abstraction Refnement (CEGAR) loop.

The CEGAR loop for our equivalence checking problem is outlined in Alg. 1. It begins with decomposing the -system into a fat -sequence, as defned above. The -predicates are created from guards in -system by rewriting -variables to -variables, ∈ [1, − 1]:

$$P\_i(V) \stackrel{\text{def}}{=} \exists V'. G\_i(V') \land pre(V, V')$$

### Algorithm 1: DecomposeAndCheck(, , , )

```
Input: -system, -system, relational pre and post-conditions
           = ⟨pre1
                  , pre2
                       , . . . , pre⟩ and   = ⟨post1
                                               , post2
                                                    , . . . , post⟩
  Output: res ∈ ⟨equiv, unknown⟩
1 
    ′ ← decompose(, );
2 for  ← 1; ≤ ; ← +1 do
3  ← 
           ′
            |;  ←  |;
4 while true do
5 aligned ← ⊥; refned 1,2 ← ⊥;
6 ST ← getProduct(, , pre
                                   );
7 Let Init be the body of the fact CHC in ST;
8 res ← checkSAT(Init);
9 if res then
10 ⟨inv, cex ⟩ ← checkSAT(ST ∪ { ∧ ( ∧ ) ̸=  =⇒ ⊥});
11 if ¬res ∨ cex ∈/ ∅ then
12 ⟨aligned, ⟩ ← alignCHCs(, , pre
                                           );
13 if aligned then continue;
14 else
15 ⟨inv, cex ⟩ ← checkSAT(ST ∪ { ∧ ¬ ∧ ¬ =⇒ ⊥});
16 if cex ∈ ∅ then break;
17 ⟨refned 1
               , 1, . . . , ⟩ ← refine(1, . . . , , cex );
18 ⟨refned 2
               , 1, . . . , ⟩ ← refine(1, . . . , , cex );
19 if ¬(refned 1 ∨ refned 2 ∨ aligned) then return unknown;
20 return equiv;
```
Note that the relational precondition pre is assumed to be a conjunction of equalities. This gives us two fat -sequences, which lets us consider pairs of loops (line 2) from both systems separately. Each such CHC system is created by applying the projection from Def. 5. In a sense, this is an abstraction of the original system since by isolating one loop (say, th), we lose the state computed all the way from the entry to the program by iterating − 1 loops. Aiming to check equivalence for each pair of projections, the algorithm frst fgures out how/if a lockstep-composition is applicable. We write: res ← checkSAT() to denote a satisfability check for a (frst order) formula , and we write:

$$\langle inv, ex \rangle \leftarrow \text{CHACKSAT}(ST\_i \cup \{L \land \dots \implies \bot\})$$

to denote this check for the CHC-product ST over predicate symbol with respect to the query written in {. . .}. The check returns either an inductive invariant (i.e., an interpretation of ) or a counterexample. Before checking for lockstep, the compatibility of the initial states needs to be checked, i.e., if the body of the fact is satisfable (line 8). If it succeeds, each check of the lockstepcomposability is reduced by Def. 4 to a CHC satisfability check, and it uses both guards in the CHC query (line 9). If either the initial-states check or the lockstep check fails, the algorithm uses a method for alignment of projections discussed in detail in Sect. 6. If aligned, we continue with the next iteration of the loop, attempting to prove lockstep composition and equivalence of the projections.


Input: Set of CHC systems 1, . . . , over ; and counterexample cex Output: res ∈ ⟨⟨⊥, ·⟩,⟨⊤, refned systems 1, . . . , ⟩⟩ 1 if = 1 then return⟨⊥, ·⟩; <sup>2</sup> while cex ∈/ ∅ do <sup>3</sup> ⟨, cex ′ ⟩ ← checkSAT(−1∪{( )∧¬−1(V )∧ ⋀ ∈V =cex () =⇒ ⊥}); <sup>4</sup> if cex ′ ∈ ∅ then <sup>5</sup> assert( /∈ ∅); 6 Fact ← { ∈ | has form Init( ) =⇒ ( )}; 7 ← ∖ {Fact} ∪ {Init( ) ∧ ( ) =⇒ ( )}; 8 return⟨⊤, 1, . . . , ⟩; 9 else <sup>10</sup> ⟨res, 1, . . . , −1⟩ ← refine(1, . . . , −1, cex ′ ); 11 if ¬res then return⟨⊥, ·⟩;

Example 4. Recall CHC systems defned in Examples 1 and 2. In the frst iteration, Alg. 1 considers the frst pair of loops. The initial-states check at line 8 fails, and thus the loops are aligned at line 12 (to be explained in Example 8).

Whenever two CHC systems are in lockstep, the algorithm utilizes Lemma 1 and checks the product system computed for two isolated loops (line 15) for safety. The success of the check lets the algorithm to continue with the next pair of loops. Otherwise, we receive a counterexample, which might be spurious because of the abstraction. Our refnement procedure then searches for a strengthening of either of the CHC systems (lines 17-18), which is described in more details in the next subsection. If it cannot refne further using the given technique, it returns unknown (line 19).

#### 5.3 Refnement

Due to the decomposition presented in the previous section, there could be sensitive information that is available in the earlier parts of the programs, but not in the later parts. Alg. 2 gives a refnement procedure needed to propagate useful properties about the programs towards queries. Intuitively, we have to strengthen our relational preconditions, thus improving the chances to prove the safety of the th CHC product. Recall that in Alg. 1, refnement is invoked for each counterexample which is technically an assignment to the variables at the initial state of either of the programs being composed into the product CHC.

The key idea is to check if the counterexample is spurious by constructing a scenario in which the − 1 th system can eventually reproduce the values from the counterexample at the end of its execution (line 3). This is reduced to a satisfability check of the corresponding CHC system w.r.t. the "negation" of the counterexample. If it succeeds, then an inductive invariant can be used to strengthen (line 7) the th system. Otherwise, the algorithm might recursively descend to refning the − 1 th system via fnding an invariant for the − 2 nd product, and so on (line 10). For this reason, the algorithm has the while-loop (line 2) that lets to repeat the satisfability check for some (already strengthened) systems, and it continues till the current system has been refned.

Example 5. Continuing with Example 4, in the second iteration of Alg. 1, the lockstep check<sup>1</sup> does not succeed:

$$\begin{aligned} a = c \land b = d \land M = X \land Y = K \land (a = N \lor a \ge 2 \land M + 1) \land c \ge 2 \land X + 1 \implies L\_2(V) \\ L\_2(V) \land a \ne N \land a' = a + 1 \land b' = i \land (a \ge b, b + 1, b) \land \\ c \ne 2 \land X + 1 + Y \land c' = c + 1 \land d' = d + 1 \implies L\_2(V') \\ L\_2(V) \land (a \ne N) \ne (c \ne 2 \ast X + 1 + Y) \implies \bot \end{aligned}$$

For the CHC system above, a counterexample could be cex = {, , , ↦→ 110, , ↦→ 50, ↦→ 0, , ↦→ 50} because we miss that = 2\* + 1 + , hence lockstep is not possible. Alg. 2 then confrms that this counterexample is spurious by learning this inductive invariant. After adding it to the fact CHC of <sup>2</sup> and recomputing the product system ST2, it becomes satisfable. We then add the following query for equivalence check:

$$L\_2(V) \land c = 2 \ast X + 1 + Y \land (a \neq c \lor b \neq d \lor M \neq X \lor K \neq Y) \implies \bot$$

which fails because of missing invariant = 2\* + 1. After adding it to the fact CHC of <sup>2</sup> and recomputing the product CHC system, it becomes satisfable.

As can be seen from this example, the refnement procedure is benefcial for both the lockstep-composability and the equivalence checks in Alg. 1, thus the inner loop in the algorithm can iterate multiple times before terminating with a positive verdict. We note that inductive invariants are in general tricky for fnding. Thus, our approach has essential limitations and cannot prove equivalence of programs that require complicated (e.g., quantifed) inductive invariants.

### 6 Aligning Unbalanced Loops

In this section, we present an algorithm for creating alignment between two single-loop CHC systems that have diferent number of loop iterations. Our new method of alignment of an -projection and a -projection is based on restructuring the former to become lockstep-composable with the latter. The algorithm identifes if any iterations of the former have to be extracted and placed before the loop and if any iterations have to be grouped and performed at once. These numbers (called alignment bounds in the rest of the section) are identifed if exact loop bounds of both projections are computable.

#### 6.1 Finding the Number of Iterations

We aim frst at computing a function that returns the exact number of iterations of a single loop in terms of input variables, based on the CHC representation.

<sup>1</sup> We abbreviate ⟨,,,,,,,, ⟩ with V , and ⟨ ′ ,′ ,,,,′ ,′ ,, ⟩ with V ′ .

In the technique presented below, the input systems need to have a counter variable that monotonically increments between two extremes that do not change in the loop.<sup>2</sup> Focusing on a single-loop CHC system with initial states Init and guarded transition body ∧ Tr where encodes a guard over the variables at the beginning of the transition, and Tr has no additional guard, we wish to fnd the exact number of the iterations of the corresponding loop. In general, for that, we could consider an augmented CHC system with a fresh decrementing counter.

Defnition 6. The exact number of iterations is an interpretation of the function symbol N that makes the augmented CHC system satisfable:

$$\begin{aligned} Int(V) \land j = \mathcal{N}(V) \implies L(V, j) \\ L(V, j) \land G(V) \land Tr(V, V') \land j' = j - 1 \implies L(V', j') \\ L(V, j) \land \neg G(V) \land j \neq 0 \implies \bot \end{aligned}$$

For an arbitrary loop, fnding N is difcult and often not possible (e.g., for problems with nondeterminism in the loop). However, for some CHC systems encoding range-based loops, i.e., that already have counters, we can attempt to synthesize N from the information obtained from syntax of CHCs. Specifcally, we assume that formula Init has the form = S(V )∧Init′ (V , ) for some variable and some function S, We also assume that the guard of the transition has the form < F(V ) ∧ ′ (V , ) for some function F, and Tr has the form ′ = + D ∧ Tr ′ (V , ,V ′ , ′ ) for some positive constant D > 0.

Defnition 7. A range-based CHC system is the one that has the following form

$$\begin{aligned} Ini'(V,i)\land i = \\$(V) \implies T(V,i)\\ T(V,i)\land i < \mathcal{F}(V)\land i' = i + \mathcal{D}\land G'(V,i)\land Tr'(V,i,V',i') \implies T(V',i')\end{aligned}$$

such that for some inductive invariant inv the following hold:

$$\Pr^{\prime}(V, i, V^{\prime}, i^{\prime}) \land inv(V, i) \implies \ $(V) = \$ (V^{\prime})\tag{1}$$

$$\text{Tr}'(V, i, V', i') \land inv(V, i) \implies \mathcal{F}(V) = \mathcal{F}(V') \tag{2}$$

$$i < \mathcal{F}(V) \land inv(V, i) \implies G'(V, i) \tag{3}$$

To guarantee soundness of our construction, the constraints in the defnition above ensure that S and F are the tightest bounds for the counter variable . Specifcally, (1) and (2) ensure that has the lower and the upper bound that do not change throughout the execution, and (3) ensures that the loop does not break before exceeds F( ). An invariant inv could in simple cases be just ⊤ but often it needs to bring important information from an initial state to an arbitrary iteration. For instance, if a loop has two counters with their own upper and lower bounds, then our analysis can proceed only when we can prove that

<sup>2</sup> A similar technique for a decrementing counter is straightforward but omitted for brevity of presentation.

either of the counters exceeds its upper bound always faster than another does so. Our running example makes another use of (3), to ensure that the residual guard ′ (V , ) is weaker than < F(V ) strengthened by the invariant.

Example 6. Recall the frst loop of the decomposed source of Example 1. It has the guard ̸= ∧ < 2\* +1. We can fnd invariant = 2\* +1+ ∧ ≥ 0. Clearly, since = 2\* + 1 + ∧ ≥ 0 ∧ < 2\* + 1 =⇒ ̸= , then F() def = 2\* + 1 satisfes (3). With no invariant, < 2\* + 1 ≠⇒ ̸= .

Lemma 2. An integer function N computes the exact number of iterations for a range-based CHC system:

N def = (F − S) div D + (if ((F − S) mod D = 0) then 0 else 1)

In practice, the approach is limited to the invariant generation capabilities. If a sufcient invariant for Def. 7 (and thus, Lemma 2) is found, the approach proceeds to align loops. Otherwise, it returns Unknown.

#### 6.2 Identifying Unrolling Depths

If the numbers of iterations can be computed, the approach proceeds to fnding alignment bounds ℓ and that defne respectively the number of iterations to be extracted and placed before the loop and the number of iterations to be grouped and performed at once in the loop. These bounds are obtained from the following ingredients:


Values ℓ and can be directly taken from a satisfying assignment to variables <sup>ℓ</sup> and for the following SMT query. Intuitively, it equates the total numbers of iterations in the -projection and the -projection:

$$\begin{aligned} Q\_{ST} \stackrel{\text{def}}{=} \exists v\_{\ell}, v\_{m} . \forall V\_{S}, V\_{T} . (v\_{\ell} \ge 0 \land v\_{m} > 0) \land pre(V\_{S}, V\_{T}) \implies \\ \mathcal{N}\_{S}(V\_{S}) - v\_{\ell} = v\_{m} \* \mathcal{N}\_{T}(V\_{T}) \end{aligned}$$

Thus, the SMT formula has the form of implication: if pre holds, then the number of iterations of one program can be expressed over the number of iterations of another program (and vice versa). If M |= ST , then ℓ def = M(ℓ), and def = M().

Example 7. For the frst projections in the decomposed source and the target, we generate the following (simplifed) SMT query:

$$Q\_{ST} = \exists v\_\ell, v\_m . (v\_\ell \ge 0 \land v\_m > 0) \land M = X \implies 2 \ast M + 1 - v\_\ell = v\_m \ast X$$

and the solver generates model M = {<sup>ℓ</sup> ↦→ 1, ↦→ 2}, and ℓ = 1, and = 2.

#### 6.3 Rearrangement of the Source Projection

Finally, we present the restructuring of the -projection based on two alignment bounds, ℓ and , computed in the previous section. The former represents the number of iterations to be moved before the loop, and the latter represents the number of iterations to make a batch inside the loop.<sup>3</sup> We assume that an projection is defned using the following two CHCs over a single predicate symbol : Init(V ) =⇒ (V ) and (V ) ∧ GTr (V ,V ′ ) =⇒ (V ′ ).

We defne an auxiliary predicate (,V ,V ′ ) that allows us to create an unrolling of arbitrary length: if = 0, the result is the identity formula, otherwise we create unrollings of the system (GTr conjoined times), then defne Init (ℓ) and GTr () , as follows:

$$\begin{aligned} U(u, V, V') & \stackrel{\text{def}}{=} \newline i e(u = 0, \, V' = V, \\ & \exists V''', \ldots, \, V^{\{u\}} \,. \, GTr\, S(V, V'') \land \ldots \land GTr\, S(V^{\{u\}}, V') \\ & \qquad \newline Init\_S^{\{\ell\}}(V') \stackrel{\text{def}}{=} \exists V \,. \, Init\_S(V) \land U(\ell, V, V') \\ & \qquad GTr\_S^{\{m\}}(V, V') \stackrel{\text{def}}{=} U(m, V, V') \end{aligned}$$

Finally, we are ready to defne the aligned CHC product used in Alg. 1 (align-CHCs(, , pre)).

Defnition 8. Let and be two range-based CHC systems, as defned in Def. 7. Let M |= ST (N, N , ℓ, , pre), as defned in Sect. 6.2. Then, the rearranged system is defned as follows:

$$\operatorname{Init}\_S^{\left(\mathcal{M}(v\_\ell)\right)}(V) \implies L(V) \qquad L(V) \land G \\ \operatorname{Tr}\_S^{\left(\mathcal{M}(v\_m)\right)}(V, V') \implies L(V')$$

Note that and are in lockstep, and is equivalent to , both by construction. Thus, after such alignment, our Alg. 1 will proceed to checking the equivalence of and by means of checking equivalence of and .

Example 8. For the frst projections in the decomposed source and the target, the lockstep check does not succeed because the body of the fact is unsatisfable:

$$a = c \land b = d \land M = X \land Y = K \land a = 0 \land N = 2 \ast M + 1 + K \land b = 2 \ast M + 1 \land M \ge 0 \land C$$

$$K \ge 0 \land c = 1 \land d = 2 \ast X + 1 \land X \ge 0 \land Y \ge 0 \implies L\_1(a, b, M, K, N, c, d, X, Y)$$

With the bounds computed in Example 7, we compute the following product:

$$\begin{aligned} a = 0 \land N = 2 \star M + 1 + K \land b = 2 \star M + 1 \land M \ge 0 \land K \ge 0 \land K \ge 0 \land K \ne 0 \\ a \ne N \land a < 2 \star M + 1 \land a' = a + 1 \land b' = i \land (a \ge b, b + 1, b) \land \\ c = 1 \land d = 2 \star X + 1 \land X \ge 0 \land Y \ge 0 \land a' = c \land b' = d \land M = X \land Y = K \\ \implies L\_1(a', b', M, K, N, c, d, X, Y) \land a \ne N \land a < 2 \star M + 1 \land a' = a + 1 \land \ b' = i \land (a \ge b, b + 1, b) \land \\ a' \ne N \land a' < 2 \star M + 1 \land a'' = a' + 1 \land b'' = i \land (a' \ge N, b' + 1, b') \land \\ c < 2 \star X + 1 \land c' = c + 2 \implies L\_1(a'', b'', M, K, N, c', d, X, Y) \end{aligned}$$

<sup>3</sup> In practice, it could also be required to move some iterations to after the loop (and our implementation supports it). Then, we split into <sup>1</sup> + <sup>2</sup> heuristically and move <sup>1</sup> iterations to before the loop, and <sup>2</sup> to after the loop.

### 7 Evaluation

We have implemented the algorithm for equivalence checking in a tool called Alien<sup>4</sup> on top of the invariant synthesizer FreqHorn that supports integers and arrays (over integers) [10]. Alien takes as input an -system and a system, automatically decomposes the former, creates a sequence of product programs, and delegates the inductive invariant generation to FreqHorn. For solving SMT queries, it uses Z3 [8]. We considered two benchmark suites:


We considered the state-of-the-art tools LLREVE [16], an equivalence checker by Churchill et al. [5], Counter [14], and CHC-Product [25]. However, only Counter was able to solve some of our benchmarks in reasonable time: Churchill et al. report that the minimum time any benchmark takes to solve is around 2 hours, and it was largely outperformed by Counter in [14].

We thus evaluate our Alien against Counter for both benchmark suites. To run Counter on a pair of manually provided C programs<sup>5</sup> , it was confgured to apply no optimization to any of the programs. For TSVC benchmarks, we manually pass an unrolling factor 8 required by each benchmark (compare to our approach in which the tool automatically identifes this number). For Alien, we provide two CHC encodings of the program before and after the optimization. We specifed a timeout of 15 minutes for both tools.

Alien solved 103 out of 104 TSVC benchmarks. Alien times out on the s279 benchmark because its invariant synthesizer struggles with fnding a helper invariant. Benchmark s113 requires the approach to automatically synthesize an extra lemma (i.e., cnt>0), in addition to the variable equalities. Alien took 3.7 seconds to solve a benchmark on average: from 1.3 in the best case to 27.4 in the worst case. Among all, 26 (resp. 2) benchmarks require moving iterations before (resp. after) the loop. Counter proved equivalence for 15 benchmarks, it failed to prove equivalence for 9 benchmarks, while the rest (81 benchmarks) timed-out. Its minimum running time is 50.2 seconds, maximum 704 seconds and average 117.4 seconds.

<sup>4</sup> The tool and benchmarks are available at https://github.com/a-hamza-r/aeval/ tree/equiv-check.

<sup>5</sup> We consulted https://github.com/compilerai/counter to run tool in our setting. Note that in their paper, the authors evaluated Counter only on compiler-optimized targets. Our case study is diferent, and it shows that checking equivalence between two arbitrary programs is a harder problem for Counter.

Fig. 3: Cactus plots (left: for TSVC benchmarks, right: for multi-phase benchmarks) comparing running times of ALIEN (blue line) and Counter (orange line).

For 24 multi-phase benchmarks, ALIEN proved all of them. Counter proved equivalence for 5 benchmarks, it failed to prove equivalence for 3 benchmarks, while the remaining benchmarks timed-out. The minimum, maximum and average times are 3.2, 32.6, and 11.5 seconds, respectively for ALIEN; and 43.8, 106.9, and 56.2 seconds respectively for Counter.

A larger picture on the experimental results is given in Fig. 3. The horizontal axes in the cactus plots represent time limit (logarithmic scale), and the vertical axes represent the numbers of benchmarks (linear scale) solved within the corresponding time limits. Intuitively, the plots demonstrate that Counter is an order of magnitude slower than our novel approach.

#### 8 Conclusion

We have presented a novel CEGAR-based approach for checking equivalence of two programs containing possibly diferent number of loops. The technique involves automatic decomposition of one of the programs to match the loops structure of the other, so that the task of equivalence checking of two given programs can be split into a sequence of tasks of equivalence checking of single loops, each of which is solved easier. Since such decomposition comes at a cost of possible loss of information, we developed a refnement schema that is intuitively based on propagation of lemmas on demand. Moreover, in case we deal with loops with provably-diferent number of iterations, our technique automatically rearranges the iterations in the loops making them lockstep-composable for each subtask. We developed the Alien tool and empirically demonstrated that our approach to equivalence checking is more efcient than state-of-the-art on two classes of public benchmarks. In future, it would be interesting to extend these techniques to more general program structures, e.g., where both programs have multiple and possibly nested loops.

Acknowledgments The work is supported in parts by a gift from Amazon Web Services and by the National Science Foundation grant 2106949.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Synthesis of Distributed Agreement-Based Systems with Efficiently-Decidable Verification

Nouraldin Jaber<sup>1</sup>() , Christopher Wagner<sup>1</sup> , Swen Jacobs<sup>2</sup> , Milind Kulkarni<sup>1</sup> , and Roopsha Samanta<sup>1</sup>

<sup>1</sup> Purdue University, West Lafayette, USA {njaber,wagne279,milind,roopsha}@purdue.edu <sup>2</sup> CISPA Helmholtz Center for Information Security, Saarbrücken, Germany jacobs@cispa.de

Abstract. Distributed agreement-based (DAB) systems use common distributed agreement protocols such as leader election and consensus as building blocks for their target functionality. While automated verification for DAB systems is undecidable in general, recent work identifies a large class of DAB systems for which verification is efficiently-decidable. Unfortunately, the conditions characterizing such a class can be opaque and non-intuitive, and can pose a significant challenge to system designers trying to model their systems in this class.

In this paper, we present a synthesis-driven tool, Cinnabar, to help system designers building DAB systems ensure that their intended designs belong to an efficiently-decidable class. In particular, starting from an initial sketch provided by the designer, Cinnabar generates sketch completions using a counterexample-guided procedure. The core technique relies on compactly encoding root-causes of counterexamples to varied properties such as efficient-decidability and safety. We demonstrate Cinnabar's effectiveness by successfully and efficiently synthesizing completions for a variety of interesting DAB systems including a distributed key-value store and a distributed consortium system.

### 1 Introduction

Distributed system designers are increasingly embracing the incorporation of formal verification techniques into their development pipelines [8,10,13,31]. The formal methods community has been enthusiastically responding to this trend with a wide array of modeling and verification frameworks for prevalent distributed systems [29,17,15,32]. A desirable workflow for a system designer using one of these frameworks is to (1) provide a framework-specific model and specification of their system, and (2) automatically verify if the system model meets its specification.

However, the problem of algorithmically checking if a distributed system is correct for an arbitrary number of processes, i.e., the automated parameterized verification problem, is undecidable, even for finite-state processes [5,34]. To circumvent undecidability, the system designer must be involved, one way or another, in the verification process. Either the designer may choose a semiautomated verification approach and use their expertise to "assist" the verifier by providing inductive invariants [32,25,15,36]. Or, the designer may choose a fully-automated verification approach that is only applicable to a restricted class of system models [16,17,24,7] and use their expertise to ensure that the model of their system belongs to the decidable class. This begs the question—for each workflow, how can we further simplify the system designer's task? While effective frameworks have been developed to aid the designer in discovering inductive invariants for the first workflow (e.g., Ivy [29], I4 [26]), there has been little emphasis on aiding the designer to build decidability-compliant models of their systems for the second workflow.

In this paper, we present a synthesis-driven approach to help system designers using the second workflow to build models that are both decidability-compliant and correct. Thus, our approach helps designers to construct models that belong to a decidable class for automated, parameterized verification, and can be automatically verified to be safe for any number of processes.

In particular, we instantiate this approach in a tool, Cinnabar, that targets an existing framework, QuickSilver, for modeling and automated verification of distributed agreement-based (DAB) systems [17]. Such systems use agreement protocols such as leader election and consensus as building blocks. QuickSilver enables modular verification of DAB systems by providing a modeling language, Mercury, that allows designers to model verified agreement protocols using inbuilt language primitives, and identifying a class of Mercury models for which the parameterized verification problem is efficiently decidable.

Unfortunately, this efficiently-decidable class of Mercury models is characterized using conditions that are rather opaque and non-intuitive, and can pose a significant challenge to system designers trying to model their systems in this class. The designer is responsible for understanding the conditions, and manually modifying their system model to ensure it belongs to the efficientlydecidable class of Mercury. This process can be both tedious and error-prone, even for experienced system designers.

Cinnabar demonstrates that synthesis can be used to automatically build models of DAB systems that belong to the efficiently-decidable fragment of Mercury and are correct.

#### Contributions. The key contributions of this paper are:


The initial stages focus on checking if a completed model is in the efficientlydecidable class while the latter stages focus on checking if a completed model is also correct. To enable efficiency, when a candidate completion fails at any stage, the architecture helps the learner avoid " similar" completions by extracting a root-cause of the failure and encoding the root-cause as an additional constraint for the learner. Each stage is equipped with a counterexample extraction strategy tailored to the property checked in that stage. The encoding procedure, on the other hand, is property-agnostic—it is able to encode the root-cause of any failure regardless of the stage that extracts it. The separation of the counterexample extractions and the encoding allows the architecture to be extensible—one can add a new stage with a new counterexample extraction strategy, and leverage the existing encoding.

3. The Cinnabar tool (Sec. 5). We develop a tool, Cinnabar, to help system designers build Mercury models of DAB systems. Cinnabar employs QuickSilver as its teacher and the Z3 SMT solver as its learner. Cinnabar is able to successfully and efficiently complete Mercury sketches of various interesting distributed agreement-based systems.

### 2 The Mercury Parameterized Synthesis Problem

We first briefly review the syntax and semantics of Mercury [17], a modeling language for distributed systems that build on top of verified agreement protocols such as leader election and consensus. Then, we formalize the synthesis problem.

### 2.1 Review: Mercury Systems

Mercury Process Definition. A Mercury system is composed of an arbitrary number of n identical Mercury system processes with process identifiers 1, . . . , n and one environment process. The programmer specifies a system process definition P that consists of (i) a set V of local variables with finite domains, (ii) a set E of events used to communicate between processes, and (iii) a set of locations that the processes can move between. Each event e in E incarnates an acting action A(e) and a reacting action R(e) (e.g., for a rendezvous event, the acting (resp.


reacting) action is the send (resp. receive) of that event). All processes start in a location denoted initial. Each location contains a set of action handlers a process in that location can execute. Each handler has an associated action, a Boolean guard over the local variables, and a set of update statements. A partial process definition is depicted on the right.

The language supports five different types of events, namely, broadcast, rendezvous, partition, consensus, and internal. The synchronous broadcast (resp. rendezvous) communication event type is denoted br (resp. rz) and indicates an event where one process synchronously communicates with all other processes (resp. another process). The agreement event type partition, denoted partition, indicates an event where a set of processes agree to partition themselves into winners and losers. For instance, in the figure, partition<elect> (All,1) denotes a leader election round with identifier elect where All processes elect 1 winning process that moves to the Leader location, while all other losing processes move to the Replica location. The agreement event type consensus, denoted consensus, indicates an event where a set of processes, each proposing one value, reach consensus on a given set of decided values. For instance, consensus<vcCmd>(All,1,cmd) denotes a consensus round with identifier vcCmd where All processes want to agree on 1 decided value from the set of proposed values in the local variable cmd. Finally, the internal event indicates an event where a process is performing its own internal computations. For a communication event, the acting action is a send, while the reacting action is a receive. For a partition event, the acting action is a win, while the reacting action is a lose. Finally, for a consensus event, the acting action is proposing a winning value, while the reacting action is proposing a losing value. We denote by A(E) and R(E) the set of all acting and reacting actions, respectively.

The updates in an action handler may contain send, assignment, goto, and/or conditional statements. Assignment statements are of the form lhs := rhs where lhs is a local variable and rhs is an expression of the appropriate type. The goto statement goto ` causes the process to switch to location ` (i.e., it can be thought of as the assignment statement vloc := `, where vloc is a special "location variable" that stores the current location of the process). The conditional statements are of the expected form: if(cond) then...else.... We denote by H the set of all handlers in the process, and for each handler h ∈ H we denote its action, guard, and updates as a(h), g(h), and u(h), respectively.

Local Semantics. The local semantics <sup>J</sup>P<sup>K</sup> of a process <sup>P</sup> is expressed as a state-transition system (S, s0, E, T), where S is the set of local states, s<sup>0</sup> is the initial state, E is the set of events, and T ⊆ S × {A(E) ∪ R(E)} × S is the set of transitions of <sup>J</sup>PK. A state <sup>s</sup> <sup>∈</sup> <sup>S</sup> is a valuation of the variables in <sup>V</sup> . We let s(v) denote the value of the variable v in state s.

The set of action handlers associated with all acting and reacting actions of all events induces the transitions in T. In particular, a transition t = s <sup>a</sup>−→ s 0 based on action handler h over action a is in T iff the guard g(h) evaluates to true in s and s 0 is obtained by applying the updates u(h) to s.

Global Semantics. The global semantics <sup>J</sup>P, n<sup>K</sup> of a Mercury system <sup>P</sup>1|| . . . ||Pn||P<sup>e</sup> consisting of n identical processes P1, . . . , P<sup>n</sup> and an environment process P<sup>e</sup> (with local state space Se) is expressed as a transition system (Q, q0, E, R), where Q = S <sup>n</sup> × S<sup>e</sup> is the set of global states, q<sup>0</sup> is the initial global state, E is the set of events, and <sup>R</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> <sup>E</sup> <sup>×</sup> <sup>Q</sup> is the set of global transitions of <sup>J</sup>P, nK.

The set of events E induce the transitions in R. As is the case for events, there are five types of global transitions: broadcast, rendezvous, partition, consensus, and internal. In particular, a transition r = q <sup>e</sup>−→ q 0 for some broadcast event e is in R iff the send local transition q[i] A(e) −−−→ q[i] 0 is in T for some process P<sup>i</sup> , and the receive local transition q[j] R(e) −−−→ q[j] 0 is in T for every other process P<sup>j</sup> with j 6= i. The remaining global transitions can be formalized similarly.

A trace of a Mercury system is a sequence q0, q1, . . . of global states such that for every i ≥ 0, the global transition q<sup>i</sup> <sup>e</sup>−→ qi+1 for some event e is in R. A global state q is reachable if there is a trace that ends in it.

Permissible Safety Specifications. QuickSilver targets parameterized verification for a class of properties called permissible safety specifications that disallow global states where m or more processes, for some fixed number m, are in some subset of the local states. We denote by φs(n) the permissible safety specifications provided by the designer for a system with n processes. A Mercury system is safe if there are no reachable error states in its global semantics. We denote that as <sup>J</sup>P, n<sup>K</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>s(n).

The Efficiently-Decidable Fragment. QuickSilver identifies a fragment of Mercury for which the parameterized verification problem of a large class of safety properties is efficiently-decidable. In particular, a pair hP, φi of a Mercury process P and a safety specification φ is in the efficiently-decidable fragment of Mercury if it satisfies phase-compatibility and cutoff-amenability conditions. For such a pair, a cutoff number c of processes can be computed and the parameterized verification problem can be reduced to the verification of the cutoff-sized system (i.e., <sup>∀</sup><sup>n</sup> : <sup>J</sup>P, n<sup>K</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>s(n) <sup>⇔</sup> <sup>J</sup>P, c<sup>K</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>s(c)).

During verification, QuickSilver computes a set of phases that an execution of the system goes through. On a high level, the phase-compatibility conditions ensure that the system moves between phases through "globally-synchronizing" events (i.e., broadcast, partition, or consensus), and that all processes in the same phase can participate in further globally-synchronizing events. This ensures that the system's ability to move between phases is independent of the number of processes in the system. The cutoff-amenability conditions ensure that an error state, where m processes are in a subset of the local states violating some safety specification, is reachable in a system of any size iff it is reachable in a system with exactly m processes. If any of these conditions fails, the designer must modify the process definition manually and attempt the verification again. We denote by <sup>J</sup>P<sup>K</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>pc (resp. <sup>J</sup>P<sup>K</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>ca) that the Mercury process <sup>P</sup> with local semantics <sup>J</sup>P<sup>K</sup> satisfies phase-compatibility (resp. cutoff-amenability) conditions.

### 2.2 Mercury Process Sketch

Let us extend Mercury's syntax to allow process sketches that can be completed by a synthesizer. In particular, we allow the process definition to include a set of uninterpreted functions that can replace various expressions in Mercury such as the Boolean expression cond in the if(cond) then . . . else . . ., the target locations of goto statements, and the rhs of assignments. <sup>3</sup> As is standard, each uninterpreted function f is equipped with a signature determining its

<sup>3</sup> Such uninterpreted functions are sufficient to be a building block for more complex expressions and statements (See, for instance, the Sketch Language [33]).

Fig. 1: Overview of Cinnabar's architecture.

list of named, typed parameters and its return type. A valid list of arguments arg for some function f is a list of values with types that match the function's parameter list. Applying a function f to a valid list of arguments arg is denoted by f(arg). Additionally, we define a function interpretation I(f) of an uninterpreted function f as a mapping from every valid list of arguments of f to a valid return value.

A Mercury process definition P that contains one or more uninterpreted functions is called a sketch, and is denoted Psk . We denote by Fsk the set of all uninterpreted functions in a sketch Psk . An interpretation I of the set Fsk of uninterpreted functions is then a mapping from every uninterpreted function fsk ∈ Fsk to some function interpretation I(fsk ).

For some process sketch Psk and some interpretation I of the set Fsk of uninterpreted functions in Psk , we denote by P<sup>I</sup> the interpreted process sketch obtained by replacing every uninterpreted function fsk ∈ Fsk in the sketch Psk with its function interpretation I(fsk ) according to the interpretation I.

#### 2.3 Problem Definition

We now define the parameterized synthesis problem for Mercury systems.

Definition 1 (Mercury Parameterized Synthesis Problem (MPSP)). Given a process sketch Psk with a set of uninterpreted functions Fsk, an environment process Pe, and permissible safety specification φs(n), find an interpretation I of uninterpreted functions in Fsk such that the system PI,1|| . . . ||PI,n||P<sup>e</sup> is safe for any number of processes, i.e., <sup>∀</sup><sup>n</sup> : <sup>J</sup>P<sup>I</sup> , n<sup>K</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>s(n).

### 3 Constraint-Based Synthesis for Mercury Systems

Architecture. To solve MPSP, we propose a multi-stage, counterexample-based architecture, shown in Fig. 1, with the following components:


Synthesis Procedure. Cinnabar instantiates this architecture as shown in Algo. 1. The algorithm starts with an empty set of constraints, C (Line 2) over the set Fsk of uninterpreted functions in the process sketch Psk . In each iteration, it checks if there exists an interpretation I of the uninterpreted functions that satisfies all the constraints collected so far (Line 4). If such an interpretation is found, it is used to obtain an interpreted process sketch P<sup>I</sup> (Line 6). Then, the algorithm checks if the system described by P<sup>I</sup> is phasecompatible and cutoff-amenable. If so, a cutoff c is computed (Line 13) and the c-sized system is checked to be safe. The cutoff-amenability stage is similar to phase-compatibility and is hence omitted from the algorithm. At any stage, if the process fails

#### Algorithm 1: Solving MPSP.


to satisfy any of these properties (e.g., a counterexample cex <sup>p</sup> to phasecompatibility is found on Line 8), the root-cause of the failure is extracted and encoded into a constraint for the learner to rule out the failure (e.g., Line 10).

<sup>4</sup> While MPSP targets permissible safety specifications, in order to improve the quality of the interpreted process sketch P<sup>I</sup> , we extend Mercury with liveness specifications to help rule out trivial completions that are safe. We emphasize that such specifications are only used as a tool to improve the quality of synthesis, and are only guaranteed for the cutoff-sized system, as opposed to safety properties that are guaranteed for any system size.

Note that these stages are checked sequentially due to the inherent dependency between them: (i) the system can only be cutoff amenable if it is phase compatible, and (ii) one can only check safety after a cutoff has been computed.

#### Lemma 1. Assuming that the teacher is sound and the learner is complete for finite sets of interpretations, Algo. 1 for solving MPSP is sound and complete.

Proof. Soundness follows directly from the soundness of the teacher. Completeness follows from that the encoding and extraction procedures ensure progress by eliminating at least the current interpretation at each iteration, and the finiteness of the set of interpretations. Finiteness follows from (i) the finite number of uninterpreted functions in a sketch Psk , (ii) the finiteness of the domain of each local variable, and (iii) the finiteness of the number of local variables in Psk .

In the remainder of this section, we describe the property-agnostic encode component in Algo. 1. In the following section, we describe our implementation of our synthesis procedure specialized to a QuickSilver-based teacher and property-specific extraction procedures.

#### Property-Agnostic Counterexample Encoding Procedure

We first describe the necessary augmentation of local semantics with disabled transitions needed for Cinnabar's counterexample extraction and encoding. While such transitions are not relevant when reasoning about a "concrete" process definition (i.e., one with no uninterpreted functions), they are quite important when extracting an explanation for why some conditions (e.g., phasecompatibility) fail to hold on <sup>J</sup>PK.

Augmented Local Semantics of the Mercury Process PI. We extend the definition of the local semantics of a Mercury interpreted process sketch P<sup>I</sup> to be <sup>J</sup>P<sup>I</sup> <sup>K</sup> = (S<sup>I</sup> , s0, E, T<sup>I</sup> , T dis I ) where S<sup>I</sup> , s0, E, and T<sup>I</sup> are defined as before and T dis I is the set of disabled transitions under the current interpretation I. In particular, a disabled transition t = s <sup>a</sup>−→ ⊥ based on action handler h over action a is in T dis I iff the guard g(h) evaluates to f alse in s. The symbol ⊥ here indicates that no local state is reachable, since the guard is disabled.

Additionally, we say a transition t = s <sup>a</sup>−→ s <sup>0</sup> based on action handler h over action a is a sketch transition if h contains no uninterpreted functions in its guard or updates. A local state s ∈ S<sup>I</sup> is concrete if (i) s is the initial state s0, or (ii) there exists a sketch transition s <sup>0</sup> −→ s where s 0 is concrete. In other words, a local state s is concrete if there exists a path from the initial state s<sup>0</sup> to s that is composed purely of sketch transitions and hence is always reachable regardless of the interpretation we obtain from the learner.

We now formalize counterexamples for phase-compatibility and cutoff amenability properties then present an encoding procedure for such counterexamples. The encoding is exact in the sense that a generated constraint c corresponding to some counterexample cex rules out exactly all interpretations I where an interpreted process sketch P<sup>I</sup> exhibits cex (as opposed to an over-approximation

where c would rule out interpreted process sketches that do not exhibit cex, or an under-approximation where c would allow interpreted process sketches that do exhibit cex). Additionally, the encoding is property-agnostic in the sense that it can handle counterexamples for any property failure.

Counterexamples. Recall that a candidate process P<sup>I</sup> based on some process sketch <sup>P</sup>sk and interpretation <sup>I</sup> has the local semantics <sup>J</sup>P<sup>I</sup> <sup>K</sup> = (S<sup>I</sup> , s0, E, T<sup>I</sup> , T dis I ). A counterexample cex to phase-compatibility (resp. cutoff-amenability) is a "subset" of the local semantics <sup>J</sup>P<sup>I</sup> <sup>K</sup> such that cex 6|<sup>=</sup> <sup>φ</sup>pc (resp. cex 6|<sup>=</sup> <sup>φ</sup>ca). We say that cex is a subset of <sup>J</sup>P<sup>I</sup> <sup>K</sup>, denoted cex <sup>⊆</sup> <sup>J</sup>P<sup>I</sup> <sup>K</sup>, when it has a subset of its enabled and disabled transitions, i.e., cex = (S<sup>I</sup> , s0, E, T<sup>0</sup> <sup>I</sup> ⊆ T<sup>I</sup> , T<sup>0</sup>dis <sup>I</sup> ⊆ T dis I ).

Encoding Counterexamples. Let C be the set of all well-typed constraints that the learner accepts. The encoding of counterexample cex = (S<sup>I</sup> , s0, E, T<sup>I</sup> , T dis I ) w.r.t. interpretation I is a formula hhcexii <sup>I</sup> ∈ C defined as:

$$\langle \langle exx \rangle \rangle\_I = \left( \bigwedge\_{t\_{an} \in T\_I} \langle \langle t\_{en} \rangle \rangle\_I \right) \wedge \left( \bigwedge\_{t\_{dis} \in T\_I^{dis}} \langle \langle t\_{dis} \rangle \rangle\_I \right),$$

where hhtenii I (resp. hhtdisii I ) is an encoding of an enabled (resp. disabled) local transition. Note that hhcexii I is satisfied under interpretation I (i.e., I |= hhcexii I ) and implies that cex <sup>⊆</sup> <sup>J</sup>PK. An encoding of some enabled transition <sup>t</sup>en <sup>=</sup> <sup>s</sup> <sup>a</sup>−→ s 0 based on action handler h over action a is defined as:

$$\langle \langle s \xrightarrow{a} s' \rangle \rangle\_I = \langle \langle s \rangle \rangle\_I \land \langle \langle a : s \rangle \rangle\_I \land \langle \langle s' : s, a \rangle \rangle\_I,$$

where:


Example. Let uf(x, y) be an uninterpreted function over local int variables x and y. Let the local state s := {vloc = F, x = 1, y = 2}, and let the local guard of action handler h over action a in location F be g := uf(x, y) > 7 ∨ x = 2. Then hha : sii <sup>I</sup> = ( uf(s(x), s(y)) > 7 ∨ s(x) = 2) = true) which is (uf(1, 2) > 7 ∨ 1 = 2) = true which simplifies to uf(1, 2) > 7.

3. the predicate hhs 0 : s, aii I indicating that s goes to s <sup>0</sup> on action a. The predicate hhs 0 : s, aii I is defined as follows. Let u(h) denote the set of updates of the form lhs := rhs of handler h over action a. Then, hhs 0 : s, aii I := V lhs:=rhs∈u(h) s 0 (lhs) = rhs[s(V )/V ].

Example. Let the set of updates have the single update x := uf(y, z) and s, s<sup>0</sup> be {vloc = F, x = 1, y = 2, z = 3} and {vloc = D, x = 5, y = 2, z = 3}. Then hhs 0 : s, aii I is: s 0 (x) = uf(s(y), s(z)) which is uf(2, 3) = 5.

An encoding of some disabled transition tdis = s <sup>a</sup>−→ ⊥ in cex is defined as hhtdisii <sup>I</sup> = hhsii <sup>I</sup> ∧ hh¬a : sii <sup>I</sup> where hhsii I is as before and the predicate hh¬a : sii I , indicating that the process cannot perform action a from state s, is defined as follows: hh¬a : sii I := (g(h)[s(V )/V ] = f alse).

The intuition behind breaking a transition's encoding to various predicates is that some phase-compatibility conditions leave parts of a transition unspecified. For instance, the predicate "the local state s can react to event e" corresponds to a local transition s R(e) −−−→ ∗ ∈ T<sup>I</sup> with encoding hhsii <sup>I</sup> ∧ hhR(e) : sii I .

Finally, to rule out any interpretation I that exhibits cex, we add the constraint c = ¬hhcexii I to the learner.

Encoding Counterexamples to Safety Properties. Similar to the local semantics, we extend the definition of the global semantics <sup>J</sup>P<sup>I</sup> , n<sup>K</sup> of a Mercury system <sup>P</sup>I,1|| . . . ||PI,n||P<sup>e</sup> to be <sup>J</sup>P<sup>I</sup> , n<sup>K</sup> = (Q<sup>I</sup> , q0, E, R<sup>I</sup> , Rdis I ), where Q<sup>I</sup> , q0, E, and R<sup>I</sup> are defined as before and Rdis I is the set of disabled global transitions under the current interpretation I. Then, a counterexample cex to safety is a "subset" of the global semantics <sup>J</sup>P<sup>I</sup> , c<sup>K</sup> such that cex 6|<sup>=</sup> <sup>φ</sup>s(c). Encoding of such a counterexample cex is formalized as before, with the encoding of an enabled global transition r in cex being a formula hhcexii <sup>I</sup> ∈ C computed as follows. For some global transition r = q <sup>e</sup>−→ q 0 , we denote by active(r) the local transitions that processes in q locally use to end in q 0 . That is, active(r) = {t ∈ T<sup>I</sup> | ∃PI,i : t = q[i] A(e) −−−→ q 0 [i] ∨ t = q[i] R(e) −−−→ q 0 [i]} We then define the encoding hhrii I as: hhrii <sup>I</sup> = V t∈active(r) hhtii I .

Note that the predicates hhqii I , hhe : qii I , hhq 0 : q, eii I , and hh¬e : qii I as well as the encoding for the global disabled transitions can be defined similar to their counterparts discussed earlier.

### 4 Counterexample Extraction

Our tool specializes the synthesis procedure in Algo. 1 by using QuickSilver

as the teacher to check phase-compatibility, cutoff-amenability, and safety. For the remainder of this section, we will refer to phase-compatibility and cutoffamenability conditions as local properties and safety (and liveness) specifications as global properties.

Local Properties. Given a local property φ expressed as first-order logic formulas over the local semantics of a Mercury process, Cinnabar extracts a counterexample cex according to Algo. 2.

First, we negate the property and express in disjunctive normal form (DNF):


φ <sup>0</sup> = ¬φ = c<sup>1</sup> ∨ c<sup>2</sup> ∨ . . ., where each cube c<sup>i</sup> = l<sup>1</sup> ∧ l<sup>2</sup> ∧ . . . is a conjunction of literals (Line 2). Then, for each cube <sup>c</sup> satisfied under <sup>J</sup>P<sup>I</sup> <sup>K</sup> (Line 5), extract a cube witness cw that is a subset of the local semantics <sup>J</sup>P<sup>I</sup> <sup>K</sup> such that <sup>J</sup>P<sup>I</sup> <sup>K</sup> <sup>|</sup><sup>=</sup> cw (Lines 7 - 9). This is done by extracting, for each literal l in c, a minimal subset lw of <sup>J</sup>P<sup>I</sup> <sup>K</sup> such that lw <sup>|</sup><sup>=</sup> <sup>l</sup> (Line 8). We say lw is a minimal witness of <sup>l</sup> if any strict subset of lw cannot be a witness for l (i.e., ∀lw<sup>0</sup> ⊂ lw : lw<sup>0</sup> 6|= l). Finally pick a minimal (in terms of size) cube witness of some cube c as a cex (Line 11). Since cex |= c and c ⇒ ¬φ, we know that cex |= ¬φ (or equivalently, cex 6|= φ).

In this work, we carefully analyzed the phase-compatibility and cutoff amenability conditions and incorporated procedures to compute witnesses for their literals (i.e., the witness calls on Line 8). We refer the interested reader to the extended version [19] of this paper for complete details, and illustrate one such counterexample extraction procedure using an example.

Example. We present a simplified phase-compatibility condition and demonstrate the above procedure on it. Let the set of broadcast, partition, and consensus events be called the globally-synchronizing events, denoted Eglobal. Let ph(s) be the set of all "phases" containing local state s. The condition states that: for each internal transition s −→ s 0 that is accompanied by a reacting transition s 0 R(f) −−−→ s <sup>00</sup> for some globally-synchronizing event f, and for each state t in the same phase as s, state t must have a reacting transition of event f. Formally:

$$\begin{aligned} \forall \mathbf{f} \in E\_{\mathbf{g1obal}}, s, s' \in S: \\ \left(s \to s' \in T \land s' \xrightarrow{R(\mathbf{f})} \* \in T\right) \Rightarrow \left(\forall X \in ph(s), t \in X: \exists t \xrightarrow{R(\mathbf{f})} \* \in T\right). \end{aligned}$$

This condition is an example of a local property φ we want to extract counterexamples for when it fails. The procedure is applied as follows: Step (1): We first simplify φ to the following:

$$\begin{aligned} &\forall \mathbf{f} \in E\_{\texttt{global}}, s, s', t \in S, X \in ph(s): \\ &\left(s \to s' \in T \land s' \xrightarrow{R(\mathbf{f})} \* \in T \land inPhase(X, s, t)\right) \Rightarrow \left(\exists t \xrightarrow{R(\mathbf{f})} \* \in T\right), \end{aligned}$$

where inP hase(X, s, t) indicates that states s and t are in phase X together. We then obtain the negation ¬φ:

$$\begin{aligned} \exists \mathbf{f} \in E\_{\mathbf{g1obal}}, s, s', t \in S, X \in ph(s) :\\ s \to s' \in T \land s' \xrightarrow{R(\mathbf{f})} \* \in T \land inPhase(X, s, t) \land \neg \exists t \xrightarrow{R(\mathbf{f})} \* \in T. \end{aligned}$$

Step (2): The formula ¬φ is in DNF, and there is a cube for each instantiation of event f ∈ Eglobal, states s, s<sup>0</sup> , t ∈ S, and phase X that satisfies the formula ¬φ. There are 4 literals. The literals "s −→ s <sup>0</sup> ∈ T " and "s 0 R(f) −−−→ ∗ ∈ T " can be witnessed by the corresponding transitions s −→ s <sup>0</sup> and s 0 R(f) −−−→ ∗, respectively. The literal "¬∃t R(f) −−−→ ∗ ∈ T " can be witnessed by the disabled transition t R(f) −−−→ ⊥. The witness for the literal inP hase(X, sa, sb) for some phase X and local states s<sup>a</sup> and s<sup>b</sup> is more involved. It depends on the nature of that phase. We analyzed the phase construction procedure given in [17] and distilled it as follows. For each event e ∈ Eglobal, we define its source (resp. destination) set to be the set of states in S from (resp. to) which there exists a transition in T labeled with an acting or reacting action of event e. Let coreP hases be the set of all source and destination sets of all globally-synchronizing actions. Then, two states s<sup>a</sup> and s<sup>b</sup> are in the same phase if:


If X is a core phase (i.e., case (A) holds), the counterexample extraction procedure returns the phase itself. Otherwise, case (B) holds and the two core phases are recursively extracted as well as the internal path connecting them.

Step (3) The final step is to build a subset of the local semantics that include the extracted witnesses for all 4 literals.

Global Properties. If a candidate process P<sup>I</sup> meets its phase-compatibility and cutoff-amenability conditions, then it belongs to the efficiently-decidable fragment of Mercury, and a cutoff c exists. It then remains to check if the system <sup>P</sup>I,1|| . . . ||PI,n||P<sup>e</sup> is safe (i.e., <sup>J</sup>P<sup>I</sup> , c<sup>K</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>s(c)).

Safety properties φs(n) are specified by the system designer as (Boolean combinations of) permissible safety specifications. Such properties are invariants that must hold in every reachable state in <sup>J</sup>P<sup>I</sup> , cK.

A counterexample cex <sup>⊆</sup> <sup>J</sup>P<sup>I</sup> , c<sup>K</sup> to a safety property <sup>φ</sup>s(c) is a finite trace from the initial state q<sup>0</sup> to an error state qe. Such traces are extracted while constructing <sup>J</sup>P<sup>I</sup> , cK.

### 5 Implementation and Evaluation

### 5.1 Implementation

Our tool, Cinnabar<sup>5</sup> , implements the architecture illustrated in Fig. 1. Additionally, it incorporates a liveness checker into the teacher. Liveness properties φl(c) ensure that the system makes progress and eventually reacts to various events. We refer the interested reader to the extended version [19] for details on specifying liveness properties as well as extracting and encoding counterexamples to such properties.

### 5.2 Evaluation

In this section, we investigate Cinnabar's performance. We study the impact of Cinnabar's counterexample extraction and encoding, as well as the choice of uninterpreted functions, on performance. Finally, we examine how Cinnabar's iterations are distributed across the different types of counterexamples.

<sup>5</sup> Cinnabar is publicly available on Zenodo [18].

Fig. 2: Cinnabar's performance compared to enumeration-based synthesis. The systems studied are: Distributed Store (DS), Consortium (CTM), Distributed Lock Service (DLS), Distributed Register (DR), Two-Object Tracker (TOT), Distributed Robot Flocking (DRF), variants Small Aircraft Transportation System Landing Protocol (SATS, SATS2), variants of Distributed Sensor Network (DSN, DSNR), and variants of Robotics Motion Planner (RMP, RMPR). For each benchmark, the i-th point denotes the average runtime for all variants with i uninterpreted functions.

Benchmarks. The benchmarks we use are process sketches based on the benchmarks presented in [17]. We refer the reader to the extended version [19] for (i) a description of each benchmark's functionality, its safety and liveness specifications, and the unspecified functionality in the sketch, and (ii) an example Mercury sketch and its completion.

Experimental Setup. To ensure that our reported results are not dependent on a particular choice of uninterpreted functions, we create a set of variants for each benchmark as follows. For each benchmark, we first pick a set ue of "candidate uninterpreted functions", corresponding to expressions that a designer might reasonably leave unspecified. Then, for each subset e in the set P(ue) of all non-empty subsets of ue, we create a variant of the benchmark where the uninterpreted functions in e are included in the sketch. We set a timeout of 15 minutes when running any variant and conduct our experiments on a MacBook Pro with 2 GHz Quad-Core Intel Core i5 and 16 GB of RAM.

Effect of Counterexample Extraction and Encoding. As our baseline, we consider a synthesis loop where the learner enumerates interpretations until a correct interpretation is found. If some interpreted process sketch P<sup>I</sup> fails a property at any stage, we add the constraint c = ¬I to the learner. This effectively eliminates one interpretation at a time, as opposed to all interpretations that exhibit the given counterexample at a time (as done by our encoder). In Fig. 2, we present a comparison of Cinnabar's runtime compared to this enumeration-based baseline. We make the following observations. While the runtimes of both enumeration-based synthesis and Cinnabar grow exponentially when increasing the number of uninterpreted functions, Cinnabar outperforms

Fig. 3: Effect of the choice of uninterpreted functions on synthesis time. For some benchmark and some number m of uninterpreted functions, the m-th boxand-whiskers plot presents, from bottom to top, the minimum, first quartile, median, third quartile, and maximum synthesis run time across the run times of all variants of that benchmark with m uninterpreted functions.

enumeration-based synthesis in almost all scenarios. Only for variants with a single uninterpreted function we observed cases where enumeration-based synthesis found a correct solution faster than Cinnabar (e.g., as in DSNR with one uninterpreted function). This is due to the additional time spent extracting and encoding counterexamples. However, the value of the counterexample extraction and encoding becomes clearly apparent with larger number of unspecified expressions as the number of interpretations grows much larger and it becomes infeasible to just enumerate them. Furthermore, Cinnabar is able to perform synthesis for any variant of our benchmarks in under 9 minutes.

Effect of the Choice of Uninterpreted Functions. In Fig. 3, for each benchmark, we examine the variation of synthesis runtime across variants with the same number of uninterpreted functions. As shown in the figure, in some cases (e.g., CTM and DS), the variation is more noticeable. The main factor contributing to this is that uninterpreted functions present different overhead on synthesis based on their nature. For instance, an uninterpreted function corresponding to a lhs of some assignment expression is more expensive to synthesize compared to an uninterpreted function corresponding to a target of some goto statement, as the latter has a smaller search space.

Counterexample Distribution on Iterations. In Fig. 4, we illustrate the different types of counterexamples encountered throughout Cinnabar's iterations. We make the following observations. First, Cinnabar spends most of its iterations ruling out phase-compatibility violations. This is expected as checking phase-compatibility is the first stage in our synthesis loop. Since a phasecompatible system moves in a structured way between its phases, this stage rules out all arbitrary completions that prohibit processes from advancing through the phases. Furthermore, there are fewer safety violations than any other type of violations. Once an interpreted process sketch is in the efficiently-decidable fragment consortium\_8\_12356789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 3 2 1 1 1 1 2 2 2 2 2 4 3 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 1 3 4 1 1 1 1 1 4 4 2 2 1 4 4 1 4 4 4 4 1 1 1 4 4 4 4 4 4 4 3 4 1 1 1 2 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 2 1 3 1 1 3 4 4 4 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 4 4 4 4 1 1 1 4 1 1 1 1 1 1 2 4 4 4 2 2 1 1 1 1 1 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 4 1 1 1 1 1 1 4 2 2 4 2 4 4 1 1 3 1 1 consortium\_8\_12346789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 4 4 4 1 4 1 1 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 1 1 4 4 4 4 1 4 4 1 1 1 1 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 2 4 2 4 2 2 4 4 4 4 4 4 4 4 1 1 1 1 1 4 2 2 1 1 1 1 1 4 4 4 2 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 2 2 1 1 4 4 4 4 4 4 4 4 4 4 4 4 2 2 1 4 1 1 1 1 4 4 1 1 2 1 1 1 1 1 1 2 1 2 1 4 1 1 1 1 1 1 1 1 1 1 consortium\_9\_123456789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 1 2 1 1 1 3 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 2 2 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 4 4 4 4 4 4 4 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 3 1 4 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 1 4 2 2 4 1 1 2 4 1 1 1 4 4 4 1 4 4 1 4 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 1 4 4 1 1 1 4 4 4 4 4 4 4 4 1 1 1 1 2 2 1 1 1 1 1 1 2 1 2 1 1 2 1 2 2 1 2 2 1 1 2 1 1 2 2 4 4 4 4 4 4 4 1 1 1 1 1 1 4 4 4 4 consortium\_8\_12345689 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 4 4 4 4 4 4 4 4 4 4 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 1 1 1 1 1 4 4 4 4 4 4 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 2 1 1 1 1 1 1 1 4 1 1 1 2 3 4 2 4 4 4 4 4 4 3 4 4 1 1 2 4 4 4 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 2 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3 1 1 1 1 2 2 1 1 consortium\_8\_12456789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 4 4 4 1 1 4 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 2 3 1 1 1 1 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 4 1 1 1 1 4 4 4 4 4 4 4 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 4 4 1 1 1 1 2 1 1 1 2 1 1 1 2 4 3 1 2 1 2 4 3 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 1 1 1 1 4 1 1 1 1 1 1 1 4 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 4 1 1 1 1 1 4 1 4 4 4 4 4 4 4 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 2 2 2 1 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 consortium\_8\_12345679 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 3 3 4 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 4 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 1 1 4 1 1 1 1 1 1 1 4 4 4 1 1 1 4 1 1 4 4 2 1 1 1 4 4 4 4 4 4 4 4 2 1 2 2 1 2 1 4 1 1 1 2 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 2 1 1 2 2 1 2 2 2 1 1 2 2 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 2 2 2 2 1 1 1 1 3 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 4 consortium\_7\_1236789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 3 1 1 4 4 4 2 2 2 4 1 4 1 1 1 2 2 2 4 4 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 1 1 1 1 1 1 4 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 4 4 4 4 4 1 1 1 2 1 1 1 1 2 1 1 1 4 1 4 4 4 1 1 1 1 1 1 2 2 2 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 2 1 1 1 1 2 2 2 3 4 4 1 4 4 4 4 4 4 2 1 4 4 4 4 1 2 1 1 2 2 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 1 1 1 1 1 1 1 1 1 2 1 2 4 2 1 1 2 2 2 4 1 4 1 1 1 2 4 4 4 4 1 1 2 2 4 4 4 4 4 4 4 1 1 1 1 1 4 4 4 4 4 1 1 1 1 1 1 1 1 1 4 4 4 4 1 2 1 2 1 4 1 4 4 4 4 4 4 4 4 4 4 4 1 consortium\_7\_1245689 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 1 1 4 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 4 2 1 1 1 1 3 1 1 1 4 4 4 4 4 4 4 4 4 4 4 1 1 1 2 4 4 3 1 2 4 4 1 1 4 1 1 4 4 4 1 1 1 1 2 2 1 1 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 3 1 1 1 4 4 4 4 4 4 4 4 1 4 4 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 2 2 1 1 2 1 1 3 3 3 3 4 4 1 1 2 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 4 4 consortium\_7\_1256789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 4 1 4 1 1 4 1 1 1 1 1 4 4 1 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 1 1 1 1 1 1 1 1 1 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 2 2 4 1 2 4 4 4 4 4 1 1 1 4 1 1 1 1 1 1 4 1 1 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 3 2 2 2 4 4 1 2 2 2 4 4 3 4 4 4 1 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 4 1 4 4 4 4 4 1 1 1 1 4 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 1 4 4 1 4 4 4 4 4 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 1 1 1 2 3 4 4 3 1 1 1 1 2 2 2 1 4 4 4 4 4 3 2 2 4 2 1 4 1 1 1 1 1 4 1 1 1 4 1 1 1 1 4 4 1 1 1 4 4 4 4 4 4 4 4 4 1 1 1 1 1 4 4 4 4 4 1 1 1 1 4 1 1 1 1 1 1 2 1 consortium\_7\_1234579 4 2 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 3 1 1 1 1 1 1 1 1 1 1 4 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 2 2 1 1 2 2 1 4 1 1 1 4 1 4 1 1 1 1 1 2 1 1 2 1 1 1 1 2 4 4 1 1 1 1 1 1 1 1 1 3 2 3 2 1 2 4 4 1 4 1 1 1 4 1 1 1 1 2 2 1 1 4 1 1 1 4 2 1 1 1 1 3 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 2 1 1 1 1 2 2 1 1 2 1 2 1 1 1 1 1 1 1 1 1 2 1 4 1 1 1 1 2 1 4 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 4 4 2 3 2 1 4 4 1 1 1 1 1 1 1 2 2 1 2 4 1 1 1 1 1 2 1 1 1 2 1 2 1 2 1 1 1 1 2 1 2 1 3 1 2 1 2 1 1 1 1 consortium\_7\_1245789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 3 4 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 3 4 1 4 4 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 2 2 2 1 1 1 3 4 2 1 1 1 1 1 1 1 1 4 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 4 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 4 1 1 1 1 1 1 1 1 1 4 1 1 1 4 4 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 4 1 4 1 2 1 1 1 4 1 1 2 1 1 2 4 2 3 1 1 1 2 4 1 2 1 1 1 1 1 1 3 4 4 4 1 1 2 3 1 1 1 1 1 1 1 1 2 4 1 1 1 1 1 1 1 1 2 1 4 1 1 consortium\_7\_1234789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 4 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 3 1 3 1 1 1 1 2 1 3 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 2 1 4 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 4 2 1 1 2 4 2 2 2 4 3 3 1 4 1 1 1 1 1 1 1 3 3 4 1 1 1 4 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 1 2 1 3 1 1 3 3 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 2 1 1 1 1 1 2 1 1 2 1 1 1 1 1 3 3 3 1 1 1 3 3 1 1 3 2 4 2 2 4 1 4 2 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 3 1 4 2 2 1 1 1 4 1 4 2 4 4 1 2 2 4 4 1 1 1 1 4 4 4 3 1 1 1 1 1 1 4 2 2 1 4 1 1 consortium\_7\_1246789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 3 1 1 1 1 3 1 1 2 1 2 3 3 1 1 1 3 1 1 1 1 3 1 1 1 4 4 1 1 1 1 1 3 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 2 4 1 1 1 1 1 1 1 1 1 2 4 4 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 1 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 1 1 1 4 4 4 4 1 1 1 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 4 1 consortium\_6\_123589 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 4 1 4 1 1 4 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 3 2 2 1 1 1 1 1 1 1 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 3 1 1 1 1 1 4 4 1 1 1 2 1 2 1 1 2 2 2 1 1 1 1 1 4 4 1 1 1 1 1 1 1 2 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 2 1 1 3 2 1 3 1 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 2 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 2 4 1 2 1 1 4 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 4 4 1 2 4 1 1 1 1 1 1 3 1 1 4 1 1 2 2 1 1 1 1 1 1 consortium\_6\_123789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 4 2 1 4 1 2 1 2 1 1 1 1 1 1 1 4 1 1 4 1 1 2 1 2 1 1 1 1 1 1 4 1 1 4 1 1 1 1 2 1 2 1 4 1 1 4 1 1 1 1 1 1 2 1 2 2 3 4 4 1 4 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1 1 2 2 2 4 1 4 1 1 1 1 4 4 1 1 1 1 2 2 4 4 1 1 1 1 1 4 4 1 1 1 1 1 1 1 4 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 4 4 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 4 1 1 1 1 1 1 1 4 2 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 2 3 2 1 1 1 2 1 3 1 4 4 1 1 1 4 4 4 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 2 3 3 4 4 4 1 1 1 1 1 1 4 1 1 1 4 1 1 3 1 1 1 2 1 consortium\_6\_124589 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 3 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 4 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 2 3 1 1 2 3 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 4 1 1 1 1 1 1 1 2 2 1 1 2 1 1 2 2 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 4 4 1 4 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 3 1 1 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 2 2 1 1 1 3 4 4 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 2 4 1 1 1 1 1 1 1 1 3 4 2 1 4 2 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 4 1 4 3 3 3 4 4 4 2 1 1 1 1 1 consortium\_6\_123489 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 4 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 3 1 3 1 1 1 1 2 1 3 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 2 1 4 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 4 2 1 1 2 4 4 1 1 1 1 4 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 3 3 3 1 2 1 1 1 1 1 2 4 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 3 2 3 1 4 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 3 3 3 1 1 1 3 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 4 1 1 2 1 1 1 1 1 1 1 1 1 3 1 1 1 1 4 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 4 4 1 2 1 1 2 1 2 1 1 consortium\_6\_125789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 2 4 1 1 2 4 1 4 1 2 4 1 4 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 4 4 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 3 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 4 1 4 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 3 4 4 1 1 4 1 4 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 satspp\_5\_12357 4 1 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2 1 2 2 2 1 2 2 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 2 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 2 1 1 1 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 1 1 2 2 1 1 1 1 2 2 1 1 2 2 2 1 1 1 1 1 1 2 2 2 1 2 1 2 1 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 2 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 2 1 1 2 2 2 1 1 2 2 1 1 1 1 3 3 2 2 1 2 2 2 1 1 1 2 2 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 2 2 2 1 1 1 2 2 2 1 2 2 2 2 1 2 2 2 2 1 3 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 consortium\_7\_1234689 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 4 4 4 1 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 2 2 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 4 4 4 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 4 1 1 2 1 3 1 3 3 1 1 1 1 1 1 1 2 4 4 4 4 4 4 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 3 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 1 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 4 4 1 1 1 2 3 1 1 1 1 1 4 4 4 4 4 4 4 4 consortium\_7\_1234679 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 3 3 4 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 2 2 1 2 1 1 3 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 2 2 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 1 2 2 2 4 4 1 4 4 4 4 2 2 1 1 1 1 1 1 1 1 1 4 1 2 1 2 1 2 3 4 1 4 2 2 4 2 2 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 4 1 3 1 1 1 1 1 1 2 1 2 2 1 3 1 1 1 2 1 4 1 1 1 2 1 4 1 1 1 2 1 1 1 1 1 1 consortium\_6\_123689 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 4 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 3 1 2 2 1 2 1 1 2 1 2 3 1 4 1 1 3 1 1 1 1 1 1 1 1 2 2 4 4 4 4 4 4 4 4 4 4 4 1 3 1 1 4 1 3 4 1 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 3 4 2 3 1 1 2 1 2 1 2 1 4 4 4 4 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 2 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 4 4 1 1 1 4 4 4 4 4 4 4 4 1 1 1 4 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 4 4 4 1 1 1 1 1 1 4 2 2 satspp\_6\_123567 4 1 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 3 3 1 2 1 2 1 2 2 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 2 1 2 1 1 1 1 1 2 1 2 2 1 2 2 1 2 2 1 1 1 2 1 1 1 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 1 2 2 1 1 1 1 1 1 3 3 3 3 2 1 2 1 2 2 1 1 1 2 1 1 2 2 2 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 2 2 2 2 1 2 2 2 2 2 1 3 3 3 3 1 1 1 1 2 2 1 1 1 1 2 2 2 2 2 2 3 1 3 1 3 3 3 3 3 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 3 3 4 1 3 1 1 1 1 1 2 1 1 2 2 2 2 1 2 2 2 1 2 1 2 2 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 3 2 2 1 2 2 consortium\_7\_1235789 4 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 3 3 1 1 1 1 1 2 1 1 1 1 4 1 1 1 1 1 1 2 2 2 3 1 1 1 1 2 2 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 3 4 4 1 1 1 2 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 4 1 4 1 1 1 1 1 1 2 2 1 1 2 1 4 1 1 1 1 1 1 1 1 4 4 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 4 4 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 2 2 2 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 4 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 4 4 2 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 consortium\_7\_1456789 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 4 4 1 1 2 1 1 1 3 1 1 1 1 1 1 1 2 2 2 1 1 2 2 2 2 1 1 1 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 4 2 2 2 4 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 3 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 3 2 1 1 1 1 1 1 1 1 1 1 3 4 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 consortium\_7\_1234589 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1 1 1 1 1 4 4 3 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 2 1 1 1 3 1 1 1 4 4 1 4 4 1 4 1 1 4 1 4 4 4 1 2 4 1 1 1 1 4 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 1 1 2 2 1 2 1 2 1 2 1 2 2 1 1 1 1 2 1 1 1 2 2 2 2 1 1 2 2 1 1 1 1 1 1 2 1 1 1 4 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 4 4 1 2 1 1 2 1 2 2 2 4 4 1 1 1 1 1 1 1 2 3 4 1 1 4 1 2 4 3 4 1 1 1 2 1 1 2 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 consortium\_6\_123678 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 1 1 2 1 1 1 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 3 3 1 1 3 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 2 2 1 2 2 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 2 2 2 4 4 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 2 2 1 2 2 2 2 2 2 2 2 2 2 4 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 1 2 2 2 2 consortium\_6\_125689 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 4 4 4 4 4 4 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 2 2 2 4 4 4 4 4 4 4 4 4 1 1 1 1 1 3 4 4 4 4 4 4 4 3 4 4 4 4 2 1 1 1 1 1 1 1 1 2 2 3 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 2 2 3 4 1 1 1 1 1 1 3 2 4 4 4 4 4 4 4 4 4 4 4 1 1 2 1 1 1 3 4 4 1 3 1 1 1 4 4 1 4 4 4 4 4 4 4 4 1 1 4 4 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 3 4 4 2 4 4 4 4 4 4 consortium\_6\_123569 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 3 3 4 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 3 4 4 4 4 4 1 1 4 1 1 1 4 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 1 1 2 1 2 1 1 2 1 2 2 2 2 2 2 1 2 1 1 2 2 1 1 1 2 1 2 2 2 1 1 2 2 2 2 2 2 2 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 1 2 2 1 1 2 1 1 2 1 1 1 2 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 2 1 2 2 4 4 4 3 1 1 1 4 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 2 2 1 2 1 2 2 2 1 2 1 2 1 2 1 1 1 1 2 1 2 1 2 2 2 2 1 2 1 2 1 1 2 1 1 2 1 1 1 2 2 1 1 2 2 1 1 1 consortium\_6\_125679 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 3 3 4 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 4 1 4 1 1 4 4 4 4 4 1 1 1 1 1 1 1 4 1 1 1 1 1 4 4 4 4 1 1 1 4 4 4 4 4 1 1 2 1 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 3 1 1 1 1 4 1 2 1 2 1 2 4 4 1 4 4 1 1 1 1 1 1 2 3 4 4 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 2 3 4 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 4 4 1 1 1 1 1 1 1 3 1 1 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 2 2 2 1 2 2 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 4 1 1 2 1 1 1 1 1 1 1 1 1 consortium\_6\_126789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 3 1 1 1 1 4 1 1 1 2 4 4 2 2 4 1 4 1 1 2 2 4 1 1 1 1 1 1 4 4 4 1 4 1 1 1 1 1 1 1 4 4 1 1 1 4 4 4 4 1 1 1 1 1 4 1 1 2 2 1 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 3 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 4 1 1 1 1 4 1 1 1 4 1 3 2 2 2 4 3 1 1 1 1 1 4 4 4 4 4 4 4 4 1 1 1 1 4 4 4 4 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 1 1 1 1 1 3 2 3 4 2 2 1 4 4 2 3 4 4 2 2 1 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 1 1 1 4 4 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 2 2 2 3 4 4 1 4 4 4 4 4 1 1 1 1 1 4 1 1 1 1 1 1 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 1 1 1 3 3 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 consortium\_7\_1234569 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 3 3 4 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 4 4 4 4 1 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 3 3 4 4 4 1 1 4 4 4 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 4 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 3 4 4 4 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 2 2 2 2 2 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 4 2 1 1 1 1 2 1 1 4 1 1 1 1 1 1 1 1 1 2 1 1 3 4 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 4 2 1 1 1 consortium\_7\_1345689 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 2 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 2 1 3 1 3 4 1 1 1 1 1 1 2 1 4 1 1 2 1 1 2 1 2 1 1 1 1 1 2 1 1 2 1 2 1 2 1 1 1 1 1 1 2 1 1 1 2 2 2 2 2 3 4 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 2 2 1 2 1 2 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2 2 2 2 1 4 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 1 4 1 1 1 1 1 1 1 1 1 1 1 4 2 4 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 2 2 1 1 2 2 3 1 4 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 2 2 1 4 2 1 1 1 2 1 1 2 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 1 2 3 4 1 2 2 2 2 2 2 4 2 2 2 2 1 1 1 2 1 consortium\_6\_123579 4 1 1 1 1 1 1 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 2 3 2 2 1 1 1 1 1 1 1 1 4 1 1 1 4 1 4 1 1 4 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 3 4 4 1 1 1 1 4 1 1 1 1 3 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 2 2 3 4 1 2 1 1 1 2 4 3 1 1 1 1 1 2 1 1 2 1 1 1 1 4 1 4 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 3 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 4 1 1 1 1 2 2 2 2 1 3 1 1 1 1 1 1 1 1 1 1 2 1 3 4 4 3 4 1 1 1 1 1 1 1 4 1 1 4 1 1 1 2 2 4 4 1 2 1 1 1 1 1 4 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 3 1 1 2 1 1 2 4 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2 2 2 2 2 2 1 2 1 1 consortium\_6\_125678 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 2 1 4 4 1 1 4 4 4 4 4 4 4 4 1 2 1 1 2 1 1 1 4 4 4 3 1 1 1 1 1 1 4 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 1 1 1 1 4 1 1 1 2 1 1 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 1 4 1 4 1 1 1 2 1 1 2 1 1 1 1 2 4 4 1 1 1 2 1 1 1 2 4 4 1 1 1 1 1 1 1 2 1 1 1 1 1 4 4 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 2 1 1 4 2 4 2 2 1 1 2 3 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 consortium\_7\_1345789 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 2 2 1 1 1 2 2 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 2 1 2 2 1 1 2 1 1 1 2 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 2 2 1 2 2 2 2 2 1 2 2 1 1 1 2 2 2 2 1 1 2 2 2 1 2 2 2 1 2 2 1 1 1 2 1 1 1 1 2 1 2 1 1 2 2 2 2 1 1 2 2 1 1 2 2 2 1 2 1 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 1 1 3 1 4 1 1 1 1 3 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1 2 2 1 2 4 2 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 3 4 2 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 4 1 1 1 1 2 1 2 2 2 2 1 1 4 1 4 1 1 1 1 1 consortium\_7\_1235689 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 4 4 4 4 4 4 4 4 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 4 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 2 2 1 1 2 2 1 2 1 3 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 4 2 2 2 2 1 3 1 1 1 1 1 4 1 4 2 2 1 1 1 1 2 2 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 consortium\_5\_12489 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1 1 1 1 3 4 1 4 1 1 1 1 1 1 1 1 1 1 4 1 1 2 1 4 1 2 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 4 4 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 4 1 1 4 2 2 3 3 3 4 4 3 1 1 4 4 1 1 3 1 1 4 4 1 1 1 2 1 1 1 1 1 1 1 1 1 4 2 1 1 1 1 4 1 4 2 2 4 1 1 4 4 1 1 1 1 1 1 1 1 1 1 4 3 4 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 3 4 4 1 1 1 1 1 2 4 1 1 4 3 4 4 4 3 1 4 3 4 4 4 2 1 1 1 1 3 3 1 1 1 1 1 3 4 4 2 1 1 1 1 1 1 1 1 1 1 1 1 3 4 4 1 1 1 1 1 1 1 1 4 3 4 4 1 3 4 3 4 1 4 1 1 1 1 1 1 1 1 3 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 satspp\_5\_12356 4 1 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 3 3 1 2 1 2 2 2 2 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 2 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 2 1 1 1 1 2 3 1 1 1 2 1 1 3 1 1 1 2 2 2 1 1 1 1 2 3 1 1 1 2 1 1 1 2 1 1 2 1 2 2 2 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 3 1 1 1 1 1 2 2 2 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 3 3 1 2 2 1 1 1 1 1 1 3 3 3 1 1 1 3 2 3 3 2 2 2 2 2 2 1 1 1 2 2 2 1 1 2 1 3 3 1 3 2 3 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 3 2 2 3 2 2 3 2 1 2 3 1 1 3 1 3 3 3 3 1 1 3 1 3 two\_object\_tracker\_7\_1234567 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 2 2 2 1 1 1 1 2 2 1 1 2 1 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 1 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 1 1 2 1 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 2 1 2 2 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 1 2 1 1 1 2 1 1 1 1 1 2 1 2 2 2 1 2 1 1 1 1 2 2 2 1 2 2 consortium\_6\_124789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 3 1 1 1 1 2 1 2 3 3 1 1 1 1 1 4 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 3 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 2 2 4 1 1 1 2 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 2 1 1 1 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 2 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 3 4 2 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 3 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 4 1 1 1 4 4 1 4 1 1 4 1 1 1 1 1 2 2 4 4 4 1 1 1 1 1 1 1 1 4 4 1 4 4 1 1 1 1 1 1 1 4 2 2 2 3 4 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 consortium\_6\_123567 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 3 1 1 1 2 1 2 2 1 1 1 4 1 4 4 4 1 1 1 4 1 1 1 1 4 4 4 4 4 1 4 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 2 2 1 1 1 4 1 1 1 1 2 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2 1 2 2 1 2 2 1 1 2 2 2 1 1 1 1 1 1 4 4 4 1 4 4 4 1 3 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 4 4 4 4 3 1 1 1 1 2 2 2 2 1 1 2 2 1 2 1 1 2 2 2 1 1 2 2 2 2 1 2 2 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 consortium\_5\_12389 4 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 1 1 2 3 1 3 1 1 1 4 4 1 3 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 2 1 3 1 1 1 1 1 3 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 4 4 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 4 1 1 1 3 1 1 1 1 1 1 1 1 4 4 1 1 1 1 3 1 3 4 4 1 4 3 1 1 4 4 1 4 3 2 1 2 2 1 1 1 2 1 1 1 1 3 4 1 1 1 1 3 1 1 1 1 2 4 4 3 4 1 2 2 3 2 1 4 4 4 4 4 4 4 3 4 2 2 4 3 4 4 1 3 4 1 1 1 4 4 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 4 1 1 1 1 1 1 1 4 1 1 1 1 1 2 2 1 1 4 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 consortium\_5\_12589 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 3 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 4 1 4 1 4 1 4 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 2 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 3 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 2 2 2 2 4 4 4 1 1 3 3 1 4 4 1 1 2 1 2 4 3 4 4 4 1 1 2 2 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 4 1 3 1 1 2 2 1 4 4 4 2 3 1 3 3 3 4 1 4 4 1 4 1 1 4 4 1 1 4 4 4 4 4 4 1 1 4 1 4 1 1 4 4 1 1 1 2 3 1 1 2 2 2 2 3 1 1 1 1 1 1 2 1 4 1 1 1 2 1 3 1 1 3 1 1 1 1 1 2 3 3 1 1 2 4 4 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 1 sats\_6\_123468 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 2 2 2 1 1 1 1 1 1 1 3 2 1 1 1 3 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 2 1 1 3 3 1 1 2 2 2 1 1 1 2 1 1 2 1 2 1 2 2 2 1 1 1 1 2 2 2 1 1 1 1 2 1 2 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 1 1 2 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 4 2 2 1 1 1 4 2 3 1 2 4 3 3 4 3 1 1 1 2 3 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 3 1 1 2 1 1 2 1 4 1 1 1 1 1 2 1 1 4 3 3 1 1 1 3 satspp\_5\_12346 4 1 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 2 2 2 1 1 2 1 1 2 2 2 2 3 3 1 1 1 1 1 1 2 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1 1 1 3 1 3 1 1 1 1 1 1 1 1 1 3 1 3 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 1 3 3 3 1 1 2 1 1 1 2 1 1 1 2 2 2 1 1 2 1 2 1 2 2 3 3 3 3 2 2 3 1 1 3 1 3 3 1 3 3 3 3 3 3 3 3 3 1 1 1 1 2 1 2 2 2 2 1 1 1 1 2 1 2 2 2 2 1 1 1 1 1 1 2 1 2 2 2 2 1 1 1 1 2 1 2 2 2 2 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 2 2 1 1 1 1 1 1 2 1 2 2 2 1 2 1 1 3 1 3 2 2 2 2 1 1 1 1 3 1 1 3 1 3 2 2 2 2 1 1 1 3 3 1 3 1 1 1 1 3 1 1 3 2 2 2 2 1 1 1 1 1 consortium\_6\_124679 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 3 3 4 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 4 4 4 1 1 2 4 1 4 1 1 1 1 1 1 1 2 1 1 4 4 4 1 4 4 1 1 2 1 1 1 1 1 2 1 1 1 1 1 4 1 1 4 4 4 4 1 1 4 4 1 1 1 1 3 4 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 3 4 4 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 1 1 4 1 1 1 1 1 2 1 1 1 1 2 3 4 4 4 1 1 1 1 1 2 1 1 1 1 1 2 3 2 1 1 1 1 1 1 2 4 4 1 1 1 1 4 1 1 1 1 4 1 1 1 4 1 3 1 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 3 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 3 satspp\_3\_126 4 1 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 3 3 2 1 2 2 2 2 2 1 1 2 1 1 1 1 1 1 1 2 1 2 1 1 2 2 1 1 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 2 2 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 2 2 1 2 2 1 1 1 1 2 2 2 1 1 2 2 2 1 1 1 1 1 1 2 1 2 1 1 2 1 3 3 3 1 1 1 1 3 1 1 3 1 3 2 1 2 1 1 2 1 1 1 3 3 3 3 2 2 1 1 1 2 1 1 1 2 1 1 1 1 1 2 1 1 3 1 2 1 2 3 1 1 1 1 2 3 3 1 1 3 3 2 3 1 1 3 3 3 3 3 3 2 1 2 2 2 2 2 2 1 2 1 2 2 2 2 2 1 2 2 2 2 1 1 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1 1 3 2 2 2 2 2 2 2 1 1 2 2 3 1 1 3 1 1 3 consortium\_5\_12579 4 1 1 1 1 1 1 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 3 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 3 2 3 1 1 1 1 3 1 1 3 2 2 2 1 2 2 2 2 1 2 2 2 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 4 4 4 1 1 1 1 4 4 1 1 1 1 1 4 1 4 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 2 1 1 2 1 4 1 1 1 1 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 1 1 1 2 1 1 3 2 2 4 4 1 3 2 1 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 4 2 2 3 4 4 2 1 1 1 2 1 1 1 1 3 1 1 1 1 1 1 1 1 1 4 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 consortium\_7\_1346789 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 2 2 1 1 1 2 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 3 4 4 1 1 1 1 1 1 2 1 1 1 1 2 2 2 1 1 2 1 2 1 1 2 2 2 2 1 1 1 2 1 1 1 1 1 2 1 1 4 1 1 1 1 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 2 2 1 1 1 1 1 4 1 1 2 1 3 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 4 2 1 1 1 1 1 1 1 1 1 1 1 2 2 4 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 4 2 consortium\_7\_1234568 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 3 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 3 1 1 2 1 2 2 1 4 4 4 4 4 4 4 4 4 4 1 1 1 4 1 1 1 4 1 1 4 4 1 1 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 3 4 2 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 4 2 1 4 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 4 4 4 4 4 4 4 1 1 4 4 4 4 4 1 4 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 4 4 2 2 2 2 2 1 1 2 1 1 1 1 1 3 4 4 4 4 4 4 4 4 4 3 1 1 2 3 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 2 4 2 1 consortium\_6\_235689 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 1 1 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 1 2 2 2 4 1 1 2 2 2 2 2 1 2 2 4 2 1 2 2 1 1 1 4 2 2 2 2 1 2 consortium\_8\_13456789 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 2 1 1 4 1 1 1 1 1 1 1 2 1 1 1 2 1 2 1 1 1 1 1 1 2 4 4 1 1 1 3 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 3 2 4 2 2 1 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 2 4 1 2 3 2 1 1 1 1 2 2 1 2 2 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 1 1 1 1 1 2 1 2 2 2 2 2 1 2 1 4 1 1 3 1 1 1 1 1 1 4 1 3 consortium\_5\_12349 4 2 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 3 1 1 1 1 1 1 1 1 1 2 1 1 1 1 4 1 1 1 1 2 1 1 1 1 4 1 1 1 2 1 3 1 1 4 4 1 1 4 1 1 2 1 1 2 2 1 1 1 1 3 1 4 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 2 2 1 3 4 1 1 1 1 1 1 1 1 2 1 2 1 1 1 2 2 1 2 1 1 3 1 1 2 2 1 1 2 2 1 4 4 1 1 1 1 1 1 1 1 3 4 3 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 3 4 4 4 2 2 3 2 4 1 1 2 1 1 1 2 1 1 2 1 2 1 1 4 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 4 1 1 1 3 1 consortium\_5\_12479 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 3 3 4 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 4 2 1 2 4 1 4 1 2 1 1 4 1 1 1 1 1 1 1 1 3 1 3 1 1 4 1 1 1 1 1 1 1 1 1 4 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 1 1 2 1 1 1 3 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 4 3 4 1 1 1 1 1 1 4 1 4 1 1 1 3 3 4 4 1 1 1 1 1 1 1 3 1 1 1 1 2 1 1 3 1 4 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 3 4 4 1 1 1 1 1 1 4 1 1 1 1 3 4 4 2 4 2 1 1 1 4 1 4 1 1 1 1 1 1 1 1 1 3 consortium\_6\_124579 4 1 1 1 1 1 1 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 2 2 1 2 1 2 1 1 1 1 4 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 3 2 1 2 2 2 2 1 1 2 2 2 1 2 2 1 1 1 2 1 2 2 2 1 2 2 1 1 2 1 1 2 1 1 1 2 2 1 1 1 1 1 4 1 1 1 1 2 2 2 2 2 1 1 4 4 2 1 1 3 1 1 1 1 1 1 1 1 1 1 1 4 1 2 1 1 1 2 1 1 1 1 2 2 2 2 3 1 1 2 1 1 1 1 1 1 2 2 2 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 4 2 1 1 1 1 1 1 3 2 1 1 1 1 3 3 1 1 1 1 1 1 1 1 2 2 1 3 4 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 3 1 kvs\_5\_12345 4 1 2 2 2 2 2 1 4 4 4 4 4 4 4 4 2 4 4 4 2 4 4 2 2 2 2 4 2 2 4 2 4 4 4 4 4 2 2 1 1 2 2 4 4 4 4 4 4 4 4 2 4 4 4 4 4 1 1 2 4 2 4 4 2 4 4 4 4 4 4 2 4 4 4 4 4 4 4 2 2 4 1 2 4 4 4 2 2 1 4 1 2 1 2 4 1 1 2 4 1 2 4 1 4 2 1 1 4 4 2 2 2 1 2 2 2 2 4 4 2 4 2 4 2 4 1 4 2 2 4 1 2 2 2 4 4 4 1 2 4 2 2 2 4 4 4 4 4 4 2 2 2 2 2 1 4 4 4 4 4 2 2 1 2 4 4 1 1 1 2 2 2 1 1 1 2 2 2 2 2 1 4 2 4 4 4 4 2 1 2 1 4 1 1 1 4 4 4 2 1 1 2 4 2 4 4 4 2 1 1 1 4 4 4 1 2 2 2 2 2 1 1 1 2 1 1 4 4 4 4 4 2 2 2 2 1 1 4 4 1 1 1 1 2 4 1 1 1 1 4 4 4 4 4 4 4 1 4 4 4 4 4 1 1 1 1 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 1 2 1 4 1 1 1 4 4 4 4 1 1 consortium\_8\_12345678 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 3 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 3 1 1 2 1 2 2 1 4 1 4 4 4 4 4 4 4 4 4 1 1 1 4 1 1 1 4 1 1 4 4 1 1 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 4 1 1 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 4 3 1 1 1 1 1 1 1 1 1 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 1 2 2 1 1 consortium\_7\_1356789 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 2 1 2 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 1 1 1 1 2 2 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 4 4 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 2 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 4 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 2 1 1 consortium\_5\_12359 4 1 1 1 1 1 1 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 1 3 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 4 1 1 1 1 1 1 4 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 3 4 1 1 1 1 1 1 1 2 2 1 1 1 2 2 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 2 1 2 2 3 4 4 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 4 1 4 1 1 1 2 1 2 2 2 2 2 1 1 1 1 1 1 1 2 1 2 1 3 1 4 1 1 2 1 3 3 1 1 1 4 4 1 3 4 4 1 1 1 2 1 1 2 2 2 1 1 1 3 4 3 1 1 2 1 1 1 2 1 1 2 1 1 2 2 1 1 1 1 1 1 1 1 4 3 4 1 3 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 satspp\_5\_12457 4 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 3 2 1 1 1 1 1 1 1 1 2 2 1 1 2 2 1 2 2 1 1 1 1 2 2 1 1 2 2 1 1 2 1 1 1 2 2 1 2 2 1 1 1 2 1 1 1 2 2 1 1 2 1 1 1 2 2 1 2 1 1 1 2 1 1 2 1 1 2 2 2 1 1 1 1 1 2 2 2 2 2 1 2 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 2 3 1 2 2 2 1 1 1 1 1 1 2 1 2 2 2 1 2 2 2 2 1 2 2 2 1 2 2 1 2 2 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 3 1 1 1 1 1 1 3 3 3 3 3 sats\_4\_1268 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 2 2 1 2 1 1 1 1 3 2 2 2 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 1 1 1 2 1 2 3 1 1 1 2 1 2 1 2 1 1 1 1 1 2 2 1 1 4 3 1 3 3 3 2 2 3 1 1 1 2 2 2 1 1 2 1 1 2 1 2 1 1 1 4 2 3 1 4 3 3 4 1 3 4 3 1 1 1 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 2 1 2 1 1 2 2 1 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 4 2 2 1 1 2 2 1 2 2 2 1 4 1 2 1 2 1 1 1 2 2 1 2 1 2 1 1 1 2 2 2 4 2 1 1 2 1 2 1 1 1 2 1 1 3 3 3 consortium\_6\_123468 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 3 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 4 4 1 1 4 1 1 1 1 1 2 1 1 1 1 1 4 1 3 1 1 1 1 1 1 1 1 1 1 1 4 1 2 1 1 1 1 1 1 4 4 1 4 4 1 2 1 4 3 4 1 1 1 3 1 1 3 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 2 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 4 2 2 4 1 2 2 2 2 2 2 2 2 3 2 1 2 2 4 1 4 4 1 4 1 1 4 4 4 4 4 4 4 4 4 1 1 1 1 1 4 1 2 1 1 1 2 1 1 1 1 1 1 1 1 2 2 2 consortium\_6\_124689 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 2 2 1 1 1 1 2 1 3 1 1 1 1 1 1 1 1 2 1 2 3 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 4 4 4 4 4 4 4 4 1 1 1 1 1 3 4 4 4 4 4 4 4 4 4 4 4 1 4 3 4 4 4 4 4 1 4 1 1 1 1 3 3 4 4 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 4 1 4 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 4 2 1 1 2 1 1 2 2 satspp\_6\_124567 4 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 3 3 2 1 2 1 2 2 2 2 2 2 2 2 2 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 2 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1 1 2 1 2 1 1 1 2 2 2 1 1 2 1 1 1 2 2 2 1 1 2 1 2 1 2 1 2 1 1 2 2 2 1 1 1 2 1 1 2 2 2 1 2 2 1 1 1 2 1 1 1 1 2 2 2 2 1 1 1 2 1 1 1 1 1 1 2 1 2 2 2 1 1 1 2 1 1 2 2 2 1 1 1 2 1 2 1 1 2 2 1 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 3 3 3 3 3 1 1 3 consortium\_6\_135789 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 1 1 1 2 1 2 2 2 2 1 1 1 4 1 1 1 1 1 1 2 4 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 1 2 1 1 1 1 1 2 1 2 2 1 1 2 2 2 1 1 1 1 1 1 4 4 1 1 1 3 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 4 1 2 4 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 1 1 1 1 1 1 consortium\_8\_12345789 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 2 1 1 3 3 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2 2 1 1 1 1 2 1 4 1 1 1 2 2 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 2 2 2 2 3 4 1 1 1 1 1 1 1 4 1 1 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 consortium\_5\_35689 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 1 2 1 1 1 1 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 1 2 2 2 2 2 2 2 2 4 2 2 2 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1 1 1 2 2 2 1 2 2 2 4 2 consortium\_8\_23456789 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 2 2 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 4 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 2 2 2 2 2 1 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 1 2 2 2 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2 2 4 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 3 4 1 1 1 1 1 1 1 1 2 1 2 2 2 2 4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 1 1 consortium\_7\_1245678 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 4 1 4 4 4 4 4 4 4 4 4 2 1 1 2 1 1 1 1 1 1 1 1 1 4 4 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 4 3 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 3 4 2 1 2 2 1 1 1 1 2 4 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 3 1 3 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 4 2 2 two\_object\_tracker\_5\_12345 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 1 2 2 2 1 1 2 2 2 2 2 2 1 1 1 1 1 1 2 1 2 2 2 2 1 1 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 1 2 2 2 1 1 1 1 2 2 1 1 1 2 2 1 2 2 1 2 1 2 2 1 2 2 1 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 consortium\_7\_1234678 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 3 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 2 2 4 2 4 1 1 1 1 2 2 1 1 1 1 1 1 1 4 1 1 1 1 2 1 1 4 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 2 4 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 2 1 1 1 1 1 1 3 1 1 1 1 1 1 3 1 1 1 2 1 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 dmr\_withreset\_7\_1234578 2 1 1 2 1 1 1 2 1 2 1 1 1 2 1 1 1 1 2 1 1 1 2 2 1 1 1 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 2 1 2 2 2 1 1 1 1 1 1 1 2 3 1 1 1 2 1 4 2 2 1 1 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 1 1 1 1 1 1 1 2 1 1 3 1 1 2 1 2 1 1 1 1 1 1 2 1 2 2 2 1 2 2 2 2 2 1 1 4 1 1 1 2 2 2 2 2 1 1 1 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 2 2 1 2 1 1 1 1 2 2 1 1 1 2 1 1 2 4 1 1 1 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 4 1 4 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 2 4 2 2 2 2 1 1 consortium\_6\_123459 4 2 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 4 1 1 1 4 2 1 2 1 2 1 1 4 4 1 3 4 1 1 1 2 1 3 3 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 3 2 2 1 2 2 2 2 1 1 2 1 2 2 1 1 1 1 2 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 3 1 1 1 1 1 2 1 1 1 1 2 1 2 1 3 4 1 1 1 2 1 1 2 1 1 1 1 2 1 2 2 2 2 2 2 1 2 1 1 1 2 1 1 4 1 1 1 2 1 1 4 1 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 satspp\_6\_123457 4 1 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2 1 1 1 1 3 1 1 1 2 1 1 1 1 3 2 2 2 2 2 2 1 2 2 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 1 1 1 3 3 1 2 1 2 1 1 1 1 2 2 1 1 2 1 2 1 2 1 1 2 1 1 2 1 1 2 2 1 1 1 2 1 1 2 1 1 1 2 2 1 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 2 2 1 2 2 1 1 1 2 2 1 2 2 2 2 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1

smoke\_detector\_6\_123456 2 1 1 2 2 3 2 2 2 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 2 2 2 2 2 2 2 2 2 4 3 2 2 2 2 4 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 1 two\_object\_tracker\_4\_2345 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 1 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 1 1 1 2 2 2 1 2 2 1 1 2 1 2 2 2 1 1 1 dmr\_withreset\_7\_1234568 2 1 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 2 2 2 1 2 1 1 2 1 1 1 2 2 1 1 2 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 2 2 1 1 1 2 2 1 1 1 2 2 2 3 1 1 1 1 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 4 1 1 1 1 1 1 1 2 dmr\_withreset\_6\_123567 2 1 2 1 1 1 2 1 2 1 2 1 2 1 1 1 1 1 1 1 1 2 1 3 1 2 2 2 1 1 1 1 1 1 1 1 2 1 2 2 1 1 2 1 2 2 4 1 2 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 2 1 1 1 1 1 2 2 2 2 1 1 1 1 1 2 2 2 1 1 1 1 2 1 1 2 2 2 1 1 4 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 consortium\_5\_12568 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 2 1 4 4 4 4 4 4 4 4 4 4 1 2 1 1 2 1 1 1 4 4 1 4 1 1 1 1 1 1 1 3 3 1 3 1 1 4 1 1 4 4 4 4 1 1 1 1 1 4 4 4 4 4 4 4 1 1 4 4 4 1 1 1 2 4 2 2 4 1 1 2 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 4 2

consortium\_6\_135689 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 2 2 2 2 1 1 2 1 2 2 1 1 1 1 1 2 2 2 1 1 1 2 1 2 2 2 2 2 2 2 2 2 2 4 2 2 1 1 2 2 2 4 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 3 1 2 1 1 2 4 4 2 1 1 1 1 1 2 4 1 2 4 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 consortium\_7\_1345678 1 1 1 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 3 1 1 1 1 2 1 1 4 1 1 1 4 1 1 1 2 1 1 1 1 1 1 1 3 4 1 2 2 1 4 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 1 1 1 2 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 2 4 4 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 consortium\_6\_124569 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 3 3 4 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 1 4 4 4 4 4 1 1 1 1 1 1 1 1 4 1 4 4 4 4 3 4 1 1 1 1 1 1 1 1 1 1 1 4 3 1 4 1 1 1 1 4 4 4 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 4 4 4 4 4 1 4 1 1 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 1 1 1 2 1 1 2 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 smoke\_detector\_6\_124567 2 1 1 2 2 3 2 2 2 1 1 2 1 2 1 2 2 2 1 1 2 1 2 1 2 1 2 2 2 2 2 2 2 1 2 1 2 2 2 1 2 1 1 1 2 2 2 1 2 2 2 2 2 2 2 2 2 3 1 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 2 2 1 1 1 1 2 2 2 1 2 2 1 1 2 1 1 2 1 2 2 2 2 2 1 1 1 2 1 1 1 2 1 1 2 1 1 2 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 1 2 1 1 1 1 2 2 2 1 2 2 2 2 2 2 2 1 1 1 2 1 1 1 1 1 2 1 1 1 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 4 2 2 2 2 2 2 2 2 2 2 3 2 2 1 1 1 2 2 1 1 2 2 1 1 2 2 1 1 1 2 2 2 4 kvs\_6\_123456 4 4 4 4 4 1 4 2 2 2 2 2 2 2 2 2 4 2 2 2 4 2 2 2 2 1 4 4 2 4 4 4 4 4 1 1 2 4 4 4 2 2 4 1 2 1 1 2 2 4 4 4 2 4 2 2 4 2 4 4 4 4 4 4 4 4 1 1 1 2 1 1 1 2 2 2 2 2 2 2 2 2 2 4 2 2 4 2 4 4 4 2 4 2 4 4 4 4 4 4 4 4 1 2 2 1 1 2 1 2 2 2 1 4 4 4 1 4 2 2 2 2 2 2 2 2 2 2 1 2 1 1 4 1 4 1 1 2 1 1 2 1 2 2 2 2 1 2 1 2 2 2 4 2 4 2 4 1 1 2 1 1 1 4 4 1 1 1 2 4 4 2 4 4 4 1 1 1 2 2 1 1 1 1 1 1 2 2 2 2 2 4 2 4 4 4 4 1 1 1 1 1 1 4 4 4 1 2 2 1 2 1 4 2 1 2 2 1 4 4 4 1 1 1 1 consortium\_6\_156789 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 1 1 2 4 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 2 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 2 2 1 2 1 2 1 1 1 1 1 1 1 4 4 1 1 1 2 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 dmr\_7\_1234567 2 1 1 2 1 1 2 2 2 1 1 1 1 1 1 1 1 2 2 1 2 3 2 1 2 1 1 2 2 2 1 1 1 2 2 1 2 2 2 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 2 1 1 1 2 2 4 1 1 1 2 2 2 1 1 2 1 1 4 1 2 1 2 1 2 2 1 2 1 2 1 1 1 2 2 1 4 2 2 2 2 2 2 2 1 2 1 2 2 1 2 1 1 2 1 2 1 1 2 1 1 2 2 2 2 2 2 2 1 2 1 2 1 2 2 2 2 2 4 2 2 1 1 1 2 1 1 1 2 1 2 1 1 1 4 2 2 2 2 4 1 2 1 1 2 2 2 1 1 1 1 2 2 2 1 4 2 1 2 1 1 2 1 1 2 1 1 3 2 2 2 2 1 2 1 1 1 1 1 1 1 2 1 2 2 1 2 2 1 1

consortium\_6\_234589 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 2 1 1 2 1 1 1 1 2 2 2 1 1 1 2 2 2 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 4 2 1 2 4 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 3 consortium\_4\_1279 4 2 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 3 3 4 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 3 1 4 1 2 2 2 4 1 4 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 3 1 3 1 1 4 2 1 1 1 1 1 1 1 1 2 1 1 2 1 1 2 3 1 3 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 consortium\_5\_12468 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 3 4 2 1 1 1 1 2 2 2 1 1 2 1 1 1 2 3 3 1 1 1 1 4 4 4 4 3 1 1 4 4 4 4 4 4 4 4 4 1 1 4 4 4 4 4 4 4 4 4 1 1 1 1 2 1 1 4 2 4 2 3 1 1 2 4 4 4 4 4 3 1 1 3 1 1 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 2 4 consortium\_6\_345678 1 1 1 1 1 2 1 1 4 1 1 1 2 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 2 2 2 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 4 2 consortium\_6\_234789 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 2 2 1 1 1 1 2 1 1 2 2 1 1 1 1 1 4 2 1 1 1 1 1 1 2 2 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 4 flocking\_one\_sided\_5\_12456 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 2 2 1 2 1 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2 2 2 2 2 2 2 2 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 2 2 1 1 1 1 4 2 4 1 1 consortium\_4\_1278 4 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 1 1 2 1 1 1 1 2 1 3 3 4 1 1 1 1 1 1 1 1 1 1 1 3 3 4 1 1 1 2 2 2 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 4 4 4 1 1 3 1 1 1 4 1 1 1 4 4 4 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 4 1 1 4 4 4 consortium\_5\_15689 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 1 1 2 4 2 2 2 2 2 1 1 2 2 2 4 1 1 1 1 1 2 1 1 2 2 1 1 2 2 1 1 1 1 1 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 consortium\_4\_1239 4 1 1 1 1 1 1 4 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 2 1 1 1 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 4 1 1 1 1 4 4 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 3 1 3 1 1 1 4 1 1 3 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 consortium\_6\_136789 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 2 2 1 1 1 1 1 1 1 1 1 1 4 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 2 1 1 1 1 1 1 1 1 1 2 2 2 2 consortium\_6\_145689 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 2 1 2 1 2 2 1 1 2 1 1 2 1 2 2 2 2 1 2 2 2 1 1 1 1 2 1 2 2 2 2 4 2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 1 2 2 1 1 1 1 1 1 1 1 4 2 1 1 dmr\_6\_123457 2 1 1 2 1 1 3 2 2 1 1 2 2 1 1 1 1 1 1 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 1 2 1 1 1 2 2 2 1 2 2 1 1 1 1 1 1 1 2 2 2 2 2 3 2 2 1 2 2 1 1 1 2 2 2 2 2 1 1 1 2 2 1 1 1 1 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 1 1 1 4 1 2 1 1 2 1 2 2 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 2 consortium\_3\_129 4 2 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 3 4 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 3 1 1 1 4 4 1 1 1 1 4 1 1 1 4 1 4 3 3 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 2 1 3 4 1 1 2 1 1 4 1 3 1 1 1 3 4 4 4 1 1 1 1 1 1 3 1 1 1 consortium\_4\_1269 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 3 3 4 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 3 1 1 1 1 1 1 1 4 4 4 1 1 4 4 4 4 1 1 1 3 1 1 1 4 4 4 4 4 1 1 3 1 1 1 4 1 1 1 1 1 1 1 1 1 1 4 1 1 4 1 1 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 dmr\_withreset\_6\_123458 2 1 1 2 1 1 2 3 2 1 1 1 2 2 2 2 2 1 1 2 2 1 1 1 2 1 1 2 1 1 1 2 1 1 1 1 1 2 2 2 1 1 1 2 2 2 1 2 1 1 1 2 1 1 2 1 1 1 2 1 2 1 1 2 4 2 2 2 1 4 1 1 1 1 1 2 2 2 2 1 1 1 1 1 2 2 1 1 1 2 2 1 2 2 2 2 1 1 1 1 1 1 1 4 1 2 2 1 4 1 1 1 1 1 1 2 2 2 2 2 1 1 sats\_5\_12678 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 4 3 3 1 1 1 3 1 1 3 1 1 3 1 1 1 1 1 4 3 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2 3 1 2 2 1 2 1 1 1 2 1 1 1 2 1 1 1 2 3 3 1 1 1 1 1 1 1 3 dmr\_withreset\_6\_124568 2 1 1 2 1 2 3 2 1 1 2 2 2 2 2 1 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 1 1 2 2 1 1 1 1 2 3 1 2 1 2 1 1 1 2 1 1 2 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 dmr\_withreset\_7\_1345678 2 1 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 4 2 2 1 2 2 1 2 1 1 3 1 1 2 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 4 1 consortium\_5\_56789 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 2 2 2 1 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 consortium\_6\_236789 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 4 2 2 2 1 1 1 1 1 2 1 1 1 1 4 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 4 2 satspp\_5\_12456 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 2 1 1 2 2 2 2 1 1 1 1 1 1 1 1 3 2 1 2 2 2 2 1 1 2 2 2 2 2 2 2 2 3 3 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 3 2 1 1 1 1 1 1 3 1 3 2 1 1 3 1 1 1 1 2 1 1 2 2 2 1 1 2 1 2 1 1 3 dmr\_5\_12456 2 1 1 2 2 2 2 2 2 1 2 3 2 1 1 1 1 2 1 1 2 2 1 2 1 1 1 2 2 2 1 1 2 1 1 1 1 1 1 1 1 2 1 1 2 3 2 2 1 2 1 2 2 2 1 2 2 1 2 1 2 2 1 2 2 1 2 2 2 1 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 1 1 2 1 2 2 1 2 2 1 1 1 2 1 4 2 2 2 1 2 1 2 1 2 consortium\_7\_2356789 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 4 2 2 satspp\_6\_123456 4 1 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 3 3 1 1 1 1 3 1 1 1 3 1 1 1 1 3 3 2 1 2 2 2 2 2 2 3 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 2 1 1 1 3 1 1 3 1 1 3 1 1 3 1 1 3 1 1 2 1 2 2 2 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1 3 3 1 3 3 sats\_4\_1256 4 2 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 2 1 1 1 2 2 3 2 1 2 2 1 1 2 1 1 1 1 1 3 2 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 3 2 1 1 2 2 1 1 1 1 1 1 1 2 1 2 1 2 2 2 2 1 1 1 1 1 3 1 2 2 2 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 3 3 sats\_5\_12456 4 2 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 2 1 1 1 2 2 3 2 1 2 2 2 2 1 1 2 1 2 1 1 1 2 2 2 2 2 2 3 2 2 1 2 1 2 1 2 1 2 1 2 3 3 2 2 2 2 2 1 1 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 1 2 1 1 2 2 2 1 2 1 1 1 1 2 1 1 1 1 3 3 3 3 dmr\_withreset\_7\_1245678 2 1 1 2 2 2 1 2 2 1 2 1 2 1 1 1 1 1 2 1 1 2 1 1 1 1 1 2 1 3 2 2 1 1 1 2 1 1 1 1 2 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 1 1 1 1 2 1 1 1 1 2 2 2 1 1 1 1 1 1 2 1 1 4 1 2 4 1 1 1 1 1 1 1 1 1 1 1 1 consortium\_4\_1289 4 1 1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 3 3 4 1 1 1 1 1 1 4 4 1 1 1 3 4 4 1 1 1 1 4 1 4 1 2 1 1 1 1 1 1 1 2 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 2 2 1 1 1 1 1 2 2 2 2 3 4 1 1 1 1 1 1 dmr\_withreset\_5\_12368 2 1 2 2 2 2 2 2 1 2 3 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 1 1 1 1 2 2 2 1 1 1 2 1 1 2 1 1 1 1 1 2 1 1 2 1 2 1 2 4 2 2 2 4 2 2 1 1 1 1 2 1 4 1 1 1 1 2 2 1 1 1 4 1 2 1 1 1 2 2 2 1 1 1 1 1 1 2 1 2 1 4 consortium\_7\_1345679 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 1 2 2 3 2 2 1 1 1 1 1 4 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 3 1 two\_object\_tracker\_5\_12367 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 2 2 4 1 2 2 2 1 1 1 2 2 1 1 1 1 1 2 2 1 2 2 2 2 1 1 1 1 1 1 2 1 2 1 1 1 2 2 2 2 2 1 2 2 2 2 2 2 1 1 1 2 1 1 2 2 2 2 2 2 2 1 2 2 2 2 two\_object\_tracker\_6\_123456 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 2 2 2 1 1 1 1 1 1 2 1 1 2 2 1 1 2 1 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 2 2 1 2 2 2 2 2 2 consortium\_6\_124678 4 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 3 4 2 1 1 1 1 1 2 2 2 1 1 2 1 1 1 2 3 3 1 1 1 1 4 4 4 4 3 1 1 1 1 3 1 1 2 1 4 1 1 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 1 1 1 1 1 1 4 2 3 2 sats\_5\_12457 4 2 1 1 2 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 2 1 1 2 1 1 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 3 1 3 3 2 1 1 2 1 1 1 1 1 2 3 1 2 1 1 1 2 1 1 2 1 3 2 2 1 2 1 2 2 2 2 1 1 2 1 1 1 1 2 3 1 1 1 1 1 3 3 Fig. 4: A property-based visualization of Cinnabar's iterations for a representative subset of the variants. Each line corresponds a Cinnabar's execution of a synthesis variant of a benchmark. From left to right, each line starts with iteration 1, ends with the iteration where a correct interpretation was found, and is colored to indicate nature of violations encountered throughout the execution. For instance, the line would indicate that Cinnabar encountered a phasecompatibility violation in iteration 1, then a cutoff-amenability in iteration 2, ..., and finally was able to find a correct interpretation in iteration 6.

of Mercury, it is more likely to be safe. There are two factors that contribute to this: (i) phase-compatible systems move in a structured way and are more likely to be "closer" to a correct version of the system, and (ii) because cutoffamenability depends on the safety specification, satisfying cutoff-amenability means the interpreted process sketch is more likely to be correct with respect to the safety property already. Finally, eliminating liveness violations ensures that Cinnabar is able to synthesize higher-quality completions. As shown in the figure, liveness violations are often encountered in the very first iteration, as the SMT-based learner tends to favor interpretations with disabled guards that trivially satisfy phase-compatibility, cutoff-amenability, and safety properties.

Usability. If Cinnabar fails to synthesize a correct completion, the designer can replace existing expressions in the sketch with uninterpreted functions, allowing Cinnabar to explore a larger set of possible candidate completions.

Finally, while the supported uninterpreted functions may not correspond to large segments of the code or complex control-flow constructs, they are the main "knobs" that the designer needs to turn to ensure that their systems belong to the efficiently-decidable fragment of Mercury.

### 6 Related Work

Aiding System Designers via Decidable Verification. Ivy [29] adopts an interactive approach to aid the designer in searching for inductive invariants for their systems. Ivy translates the system model and its invariant to EPR [30], and looks for a counterexample-to-induction (CTI). The designer adjusts the invariant to eliminate that CTI and Ivy starts over. I4 [26] builds on Ivy by first considering a fixed system size, automatically generating a potential inductive invariant, and using Ivy to check if that invariant is also valid for any system size. The approach in [11] identifies a class of asynchronous systems that can be reduced to an equivalent synchronized system modeled in the Heard-Of Model [9]. The designer manually annotates the asynchronous system to facilitate the reduction, and encodes the resulting Heard-Of model in the CL [14] logic which has a semi-decision procedure. These approaches differ from ours in two ways. First, the designer needs to manually provide/manipulate inductive invariants and/or annotations to eventually enable decidable verification. Second, these approaches are "verification only": they require a fully-specified model that either meets or violates its correctness properties and the designer is responsible for adjusting the model if verification fails. Cinnabar, on the other hand, accepts a sketch that is then completed to meet its properties.

Parameterized Synthesis. Jacobs and Bloem [20] introduced a general approach for parameterized synthesis based on cutoffs, where they use an underlying fixed-size synthesis procedure that is required to guarantee that the conditions for cutoffs are met by the synthesized implementation. Our approach can be seen as an instantiation of this approach, as one of the stages in our multi-stage counterexample-based loop ensures that cutoff-amenability conditions hold on any candidate process. Other approaches that tackle the parameterized synthesis problem without cutoff results are more specialized. For instance, the approach in [24] adopts a CEGIS-based synthesis strategy where the designer provides a threshold automaton with some parameters unspecified. Synthesis completes the model and uses the parameterized model checker in [23] to check the system. A similar idea, but based on the notion of well-structured transition systems, is used for the automatic repair of parameterized systems in [21]. The approach in [22] targets parameterized synthesis for self-stabilizing rings, and shows that the problem is decidable even when the corresponding parameterized verification problem is not. The designer provides a set of legitimate states and the size of the template process, and the procedure yields a completed self-stabilizing template. A similar approach for more general topologies is presented in [28]. Bertrand et al. [6] target systems composed of an unbounded number of agents that are fully specified and one underspecified controller process. The synthesis goal is to synthesize a controller that controls all agents uniformly and guides them to a specific desired state. Markgraf et al. [27] also target synthesis of controllers by posing the problem as an infinite-duration 2-player game and utilize regular model checking and the L\* algorithm [4] to learn correct-by-design controllers. These approaches are not applicable to our setup as they do not admit distributed agreement-based systems (modeled in Mercury).

Synthesis of Distributed Systems with a Fixed Number of Processes. Various approaches focus on automated synthesis of distributed systems with a fixed number of processes [3,2,1,12,35]. While such approaches deploy a similar counterexample-guided strategy to complete a user-provided sketch, they do not provide parameterized correctness guarantees nor the necessary agreement primitives needed to model distributed agreement-based systems.

Data availability. The artifact and related data that support the findings of this work are publicly available on Zenodo [18].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## LTL Reactive Synthesis with a Few Hints

Mrudula Balachander1() , Emmanuel Filiot, and Jean-Fran¸cois Raskin

> Universit´e libre de Bruxelles, Brussels, Belgium mbalacha@ulb.be

Abstract. We study a variant of the problem of synthesizing Mealy machines that enforce LTL specifications against all possible behaviours of the environment, including hostile ones. In the variant studied here, the user provides the high level LTL specification ϕ of the system to design, and a set E of examples of executions that the solution must produce. Our synthesis algorithm first generalizes the user-provided examples in E using tailored extensions of automata learning algorithms, while preserving realizability of ϕ. Second, it turns the (usually) incomplete Mealy machine obtained by the learning phase into a complete Mealy machine realizing ϕ. The examples are used to guide the synthesis procedure. We prove learnability guarantees of our algorithm and prove that our problem, while generalizing the classical LTL synthesis problem, matches its worst-case complexity. The additional cost of learning from E is even polynomial in the size of E and in the size of a symbolic representation of solutions that realize ϕ, computed by the synthesis tool Acacia-Bonzai. We illustrate the practical interest of our approach on a set of examples.

### 1 Introduction

Reactive systems are notoriously difficult to design and even to specify correctly [1,13]. As a consequence, formal methods have emerged as useful tools to help designers to built reactive systems that are correct. For instance, modelchecking asks the designer to provide a model, in the form of a Mealy machine M, that describes the reactions of the system to events generated by its environment, together with a description of the core correctness properties that must be enforced. Those properties are expressed in a logical formalism, typically as an LTL formula ϕCORE. Then an algorithm decides if M |= ϕCORE, i.e. if all executions of the system in its environment satisfy the specification. Automatic reactive synthesis is more ambitious: it aims at automatically generating a model from a high level description of the "what" needs to be done instead of the "how" it has to be done. Thus the user is only required to provide an LTL specification ϕ and the algorithm automatically generates a Mealy machine M such that M |= ϕ whenever ϕ is realizable. Unfortunately, it is most of the time not sufficient to provide the core correctness properties ϕCORE to obtain a Mealy machine M that is useful in practice, as illustrated next.

Example 1. [Synthesis from ϕCORE - Mutual exclusion] Let us consider the classical problem of mutual exclusion. In the simplest form of this problem, we need to design an arbiter that receives requests from two processes, modeled

c The Author(s) 2023

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 309–328, 2023. https://doi.org/10.1007/978-3-031-30820-8 20

by two atomic propositions r<sup>1</sup> and r<sup>2</sup> controlled by the environment, and that grants accesses to the critical section, modeled as two atomic propositions g<sup>1</sup> and g<sup>2</sup> controlled by the system. The core correctness properties (the what) are: (i) mutual access, i.e. it is never the case that the access is granted to both processes at the same time, (ii) fairness, i.e. processes that have requested access eventually get access to the critical section. These core correctness specifications for mutual exclusion (ME) are easily expressed in LTL as follows: ϕ ME CORE <sup>≡</sup> (¬g<sup>1</sup> ∨ ¬g2) <sup>∧</sup> (r<sup>1</sup> <sup>→</sup> ♦g1) <sup>∧</sup> (r<sup>2</sup> <sup>→</sup> ♦g2). Indeed, this formula expresses the core correctness properties that we would model check no matter how M implements mutual exclusion, e.g. Peterson, Dedekker, Backery algorithms, etc. Unfortunately, if we submit ϕ ME CORE to an LTL synthesis procedure, implemented in tools like Acacia-Bonzai [11], BoSy [17], or Strix [25], we get the solution M depicted in 1-(left) (all three tools return this solution). While this solution is perfectly correct and realizes the specification ϕ ME CORE, the solution ignores the inputs from the environment and grants access to the critical sections in a round robin fashion. Arguably, it may not be considered as an efficient solution to the mutual exclusion problem. This illustrates the limits of the synthesis algorithm to solve the design problem by providing only the core correctness specification of the problem, i.e. the what, only. To produce useful solutions to the mutual exclusion problem, more guidance must be provided.

Fig. 1: (Left) The solution of Strix to the mutual exclusion problem for high level specification ϕ ME LOW . Edge labels are of the form ϕ/ψ where ϕ: Boolean formula on input atomic propositions (Boolean variables controlled by environment) and ψ: maximally consistent conjunction of literals over set of output propositions (Boolean variables controlled by system). (Right) A natural solution that could be drawn by hand, and is automatically produced by our learning/synthesis algorithm for the same specification plus with two simple examples.

The main question is now: how should we specify these additional properties ? Obviously, if we want to use the "plain" LTL synthesis algorithm, there is no choice: we need to reinforce the specification ϕ ME CORE with additional lower level properties ϕ ME LOW. Let us go back to our running example.

Example 2. [Synthesis from ϕ ME CORE and ϕ ME LOW] To avoid solutions with unsolicited grants, we need to reinforce the core specification. The Strix online demo website proposes to add the following 3 LTL formulas ϕ ME LOW to ϕ ME CORE (see Full arbitrer n = 2, at https://meyerphi.github.io/strix-demo/): (1) V <sup>i</sup>∈{1,2} ((g<sup>i</sup> <sup>∧</sup> ¬ri) → ♦¬gi), (2) V <sup>i</sup>∈{1,2} (g<sup>i</sup> <sup>∧</sup>(¬r<sup>i</sup> ∧ ¬gi) <sup>→</sup>(riR¬gi)), and (3) V i∈{1,2} (riR¬gi). Strix, on the specification ϕ ME CORE ∧ ϕ ME LOW, provides us with a better solution, but it is more complex than needed (it has 9 states: refer [5]) and clearly does not look like an optimal solution to our mutual exclusion problem. E.g., the model of Fig. 1-(right) is arguably more natural. How can we get this model without coding it into the LTL specification, which would diminish greatly the interest of using a synthesis procedure in the first place?

In general, higher level properties are properties that need to be met by all implementations, e.g. safety-critical properties. In contrast, lower level properties are more about a specific implementation, its expected behaviour and efficiency. At this point, it is legitimate to question the adequacy of LTL as a specification language for lower level properties, and so as a way to guide the synthesis procedure towards relevant solutions to realize ϕCORE. In this paper, we introduce an alternative to guide synthesis toward useful solutions that realize ϕCORE: we propose to use examples of executions that illustrate behaviors of expected solutions. We then restrict the search to solutions that generalize those examples. Examples, or scenarios of executions, are accepted in requirement engineering as an adequate tool to elicit requirements about complex systems [12]. For reactive system design, examples are particularly well-suited as they are usually much easier to formulate than full blown solutions, or even partial solutions. It is because, when formulating examples, the user controls both the inputs and the outputs, avoiding the main difficulty of reactive system design: having to cope with all possible environment inputs. We illustrate this on our running example.

Example 3. [Synthesis from ϕ ME CORE and examples] Let us keep, as the LTL specification, ϕ ME CORE only, and let us consider the following simple prefix of executions that illustrate how solutions to mutual exclusion should behave:

(1) {!r1, !r2}.{!g1, !g2}#{r1, !r2}.{g1, !g2}#{!r1, r2}.{!g1, g2}

(2) {r1, r2}.{g1, !g2}#{!r1, !r2}.{!g1, g2}

These trace prefixes prescribe reactions to typical fixed finite input sequences: (1) if there is no request initially, then no access is granted (note that this excludes already the round robin solution), if process 1 and 2 request subsequently, process 1 is granted first and then process 2 is granted after, (2) if both process request simultaneously, then process 1 is granted first and then process 2 is granted after. Given those two simple traces together with ϕCORE, our algorithm generates the solution of Fig. 1-(right). Arguably, the solution is now simple and natural.

Contributions First, we provide a synthesis algorithm SynthLearn that, given an LTL specification ϕCORE and a finite set E of prefixes of executions, returns a Mealy machine M such that M |= ϕCORE, i.e. M realizes ϕCORE, and E ⊆ Prefix(L(M)), i.e. M is compatible with the examples in E, if such a machine M exists. It returns unrealizable otherwise. Additionally, we require SynthLearn to generalize the decisions illustrated in E. This learnability requirement is usually formalized in automata learning with a completeness criterium that we adapt here as follows: for all specifications ϕCORE, and for all Mealy machines M such that M |= ϕCORE, there is a small set of examples E (polynomial in |M|) such that L(SynthLearn(ϕCORE, E)) = L(M). We prove this completeness result in Theorem 4 for safety specifications and extend it to ω-regular and LTL specifications in Section 4, by reduction to safety.

Second, we prove that the worst-case execution time of SynthLearn is 2ExpTime (Theorem 7), and this is worst-case optimal as the plain LTL synthesis problem (when E = ∅) is already known to be 2ExpTime-Complete [27]. SynthLearn first generalizes the examples provided by the user while maintaining realizability of ϕCORE. This generalization leads to a Mealy machine with possibly missing transitions (called a preMealy machine). Then, this preMealy machine is extended into a (full) Mealy machine that realizes ϕCORE against all behaviors of the environment. During the completion phase, SynthLearn reuses as much as possible decisions that have been generalized from the examples. The generalization phase is essential to get the most out of the examples. Running classical synthesis algorithms on ϕCORE ∧ ϕE, where ϕ<sup>E</sup> is an LTL encoding of E, often leads to more complex machines that fail to generalize the decisions taken along the examples in E. While the overall complexity of Synth-Learn is 2ExpTime and optimal, we show that it is only polynomial in the size of E and in a well-chosen symbolic representation a set of Mealy machines that realize ϕCORE, see Theorem 6. This symbolic representation takes the form of an antichain of functions and tends to be compact in practice [19]. It is computed by default when Acacia-Bonzai is solving the plain LTL synthesis problem of ϕCORE. So, generalizing examples while maintaining realizability only comes at a marginal polynomial cost. We have implemented our synthesis algorithm in a prototype, which uses Acacia-Bonzai to compute the symbolic antichain representation. We report on the results we obtain on several examples.

Related works Scenarios of executions have been advocated by researchers in requirements engineering to elicit specifications, see e.g. [12,14] and references therein. In [28], learning techniques are used to transform examples into LTL formulas that generalize them. Those methods are complementary to our work, as they can be used to obtain the high level specification ϕCORE.

In non-vacuous synthesis [8], examples are added automatically to an LTL specification in order to force the synthesis procedure to generate solutions that are non-vacuous in the sense of [23]. The examples are generated directly from the syntax of the LTL specification and they cannot be proposed by the user. This makes our approach and this approach orthogonal and complementary. Indeed, we could use the examples generated automatically by the non-vacuous approach and ask the user to validate them as desirable or not. Our method is more flexible, it is semi-automatic and user centric: the user can provide any example he/she likes and so it offers more flexibility to drive the synthesis procedure to solutions that the user deems as interesting. Furthermore, our synthesis procedure is based on learning algorithms, while the algorithm in [8] is based on constraint solving and does not offer guarantees of generalization, unlike our algorithm (see Thm 4).

Supplementing the formal specification with additional user-provided information is at the core of the syntax-guided synthesis framework (SyGuS [3]), implemented for instance in program by sketching [31]: in SyGuS, the specification is a logical formula and candidate programs are syntactically restricted by a user-provided grammar, to limit and guide the search. The search is done by using counter-example guided inductive synthesis techniques (CEGIS) which rely on learning [32]. In contrast to our approach, examples are not user-provided but automatically generated by model-checking the candidate programs against the specification. The techniques are also orthogonal to ours: SyGuS targets programs syntactically defined by expressions over a decidable background theory, and heavily relies on SAT/SMT solvers. Using examples to synthesise programs (programming by example) has been for instance explored in the context of string processing programs for spreadsheets, based on learning [30], and is a current trend in AI (see for example [26] and the citations therein). However this approach only relies on examples and not on logical specifications.

[4] explores the use of formal specifications and scenarios to synthesize distributed protocols. Their approach also follows two phases: first, an incomplete machine is built from the scenarios and second, it is turned into a complete one. But there are two important differences with our work. First, their first phase does not rely on learning techniques and does not try to generalize the provided examples. Second, in their setting, all actions are controllable and there is no adversarial environment, so they are solving a satisfiability problem and not a realizability problem as in our case. Their problem is thus computationally less demanding than the problem we solve: Pspace versus 2ExpTime for LTL specs.

The synthesis problem targeted in this paper extends the LTL synthesis problem. Modern solutions for this problem use automata constructions that avoid Safra's construction as first proposed in [24], and simplified in [29,18], and more recently in [16]. Efficient implementations of Safraless constructions are available, see e.g. [9,17,25,15]. Several previous works have proposed alternative approaches to improve on the quality of solutions that synthesis algorithms can offer. A popular research direction, orthogonal and complementary to the one proposed here, is to extend the formal specification with quantitative aspects, see e.g. [6,10,22,2], and only synthesize solutions that are optimal.

The first phase of our algorithm is inspired by automata learning techniques based on state merging algorithms like RPNI [21,20]. Those learning algorithms need to be modified carefully to generate partial solutions that preserve realizability of ϕCORE. Proving completeness as well as termination of the completion phase in this context requires particular care.

#### 2 Preliminaries on the reactive synthesis problem

Words, languages and automata An alphabet is a finite set of symbols. A word u (resp. ω-word) over an alphabet Σ is a finite (resp. infinite sequence) of symbols from Σ. We write for the empty word, and denote by |u| ∈ N ∪ {∞} the length of u. In particular, || = 0. For 1 ≤ i ≤ j ≤ |u|, we let u[i:j] be the infix of u from position i to position j, both included, and write u[i] instead of u[i:i]. The set of finite (resp. ω-) words over Σ is denoted by Σ<sup>∗</sup> (resp. Σ<sup>ω</sup>). We let Σ<sup>∞</sup> = Σ<sup>∗</sup> ∪ Σ<sup>ω</sup>. Given two words u ∈ Σ<sup>∗</sup> and v ∈ Σ<sup>∞</sup>, u is a prefix of v, written u v, if v = uw for some w ∈ Σ<sup>∞</sup>. The set of prefixes of v is denoted by Prefs(v). Finite words are linearly ordered according to the length-lexicographic

order ll, assuming a linear order <<sup>Σ</sup> over Σ: u ll v if |u| < |v| or |u| = |v| and u = pσ1u 0 , v = pσ2v 0 for some p, u<sup>0</sup> , v<sup>0</sup> ∈ Σ<sup>∗</sup> and some σ<sup>1</sup> <<sup>Σ</sup> σ2. In this paper, whenever we refer to the order ll for words over some alphabet, we implicitly assume the existence of an arbitrary linear order over that alphabet. A language (resp. ω-language) over an alphabet Σ is a subset L ⊆ Σ<sup>∗</sup> (resp. L ⊆ Σω).

In this paper, we fix two alphabets I and O whose elements are called inputs and outputs respectively. Given a word u ∈ (IO)∞, we let in(u) ∈ I<sup>∞</sup> be the word obtained by erasing all O-symbols from u. We define out(u) similarly and naturally extend both functions to languages.

Automata over ω-words A parity automaton is a tuple A = (Q, Qinit, Σ, δ, d) where Q is a finite non empty set of states, Qinit ⊆ Q is a set of initial states, Σ is a finite non empty alphabet, δ : Q×Σ → 2 <sup>Q</sup> \ {∅} is the transition function, and d : Q → N is a parity function. The automaton A is deterministic when |Qinit| = 1 and |δ(q, σ)| = 1 for all q ∈ Q. The transition function is extended naturally into a function Post<sup>∗</sup> : Q × Σ<sup>∗</sup> → 2 <sup>Q</sup> \ {∅} inductively as follows: Post<sup>∗</sup> (q, ) = {q} for all q ∈ Q and for all (u, σ) ∈ Σ<sup>∗</sup> × Σ, Post<sup>∗</sup> (q, uσ) = S q <sup>0</sup>∈Post∗(q,u) δ(q 0 , σ).

A run of A on an ω-word w = w0w<sup>1</sup> . . . is an infinite sequence of states r = q0q<sup>1</sup> . . . such that q<sup>0</sup> ∈ Qinit, and for all i ∈ N, qi+1 ∈ δ(q<sup>i</sup> , wi). The run r is said to be accepting if the minimal colour it visits infinitely often is even, i.e. lim inf(d(qi))i≥<sup>0</sup> is even. We say that A is a B¨uchi automaton when dom(d) = {0, 1} (1-coloured states are called accepting states), a co-B¨uchi automaton when dom(d) = {1, 2}, a safety automaton if it is a B¨uchi automaton such that the set of 1-coloured states, called unsafe states and denoted Qusf, forms a trap: for all q ∈ Qusf, for all σ ∈ Σ, δ(q, σ) ⊆ Qusf, and a reachability automaton if it is {0, 1}-coloured and the set of 0-coloured states forms a trap.

Finally, we consider the existential and universal interpretations of nondeterminism: under the existential (resp. universal) interpretation, a word w ∈ Σ<sup>ω</sup> is in the language of A, if there exists a run r on w such that r is accepting (resp. for all runs r on w, r is accepting). We denote the two languages defined by these two interpretations L ∃ (A) and L ∀ (A) respectively. Note that if A is deterministic, then the existential and universal interpretations agree, and we write L(A) for L ∀ (A) = L ∃ (A). For a deterministic automaton A, the initial state is fixed to the singleton {q}.

For a co-B¨uchi automaton, we also define a strengthening of the acceptance condition, called K-co-B¨uchi, which requires, for K ∈ N, that a run visits at most K times a state labelled with 1 to be accepting. Formally, a run r = q0q<sup>1</sup> . . . q<sup>n</sup> . . . is accepting for the K-co-B¨uchi acceptance condition if |{i ≥ 0 | d(qi)) = 1}| ≤ K. The language defined by A for the K-co-B¨uchi acceptance condition and universal interpretation is denoted by L ∀ <sup>K</sup>(A). Note that this language is a safety language because if a prefix of a word p ∈ Σ<sup>∗</sup> is such that A has a run prefix on p that visits more than K times a states labelled with color 1, then all possible extensions w ∈ Σ<sup>ω</sup> of p are rejected by A.

(Pre)Mealy machines Given a (partial) function f from a set X to a set Y , we denote by dom(f) its domain, i.e. the of elements x ∈ X such that f(x) is defined. A preMealy machine M on an input alphabet I and output alphabet O is a triple (M, minit, ∆) such that M is a non-empty set of states, minit ∈ M is the initial state, ∆ : Q × I → O × M is a partial function. A pair (m, i) is a hole in M if (m, i) 6∈ dom(∆). A Mealy machine is a preMealy machine such that ∆ is total, i.e., dom(∆) = M × I.

We define two semantics of a preMealy machine M = (M, minit, ∆) in terms of the languages of finite and infinite words over I∪O they define. First, we define two (possibly partial functions) Post<sup>M</sup> : M × I → M and Out<sup>M</sup> : M × I → O such that ∆(m, i) = (PostM(m, i), OutM(m, i)) for all (m, i) ∈ M × I if ∆(m, i) is defined. We naturally extend these two functions to any sequence of inputs u ∈ I+, denoted Post∗M and Out∗M. In particular, for u ∈ I+, Post∗M(m, u) is the state reached by M when reading u from m, while Out∗M(m, u) is the last output in O produced by M when reading u. The subcript M is ommitted when M is clear from the context. Now, the language L(M) of finite words in (IO) <sup>∗</sup> accepted by M is defined as L(M) = {i1o<sup>1</sup> . . . ino<sup>n</sup> | ∀1 ≤ j ≤ n, Post∗M(minit, i<sup>1</sup> . . . i<sup>j</sup> ) is defined and o<sup>j</sup> = Out∗M(minit, i<sup>1</sup> . . . i<sup>j</sup> )}. The language Lω(M) of infinite words accepted by M is the topological closure of L(M): Lω(M) = {w ∈ (IO) <sup>ω</sup> | Prefs(w) ∩ (IO) <sup>∗</sup> ⊆ L(M)}.

The reactive synthesis problem A specification is a language S ⊆ (IO) ω. The reactive synthesis problem (or just synthesis problem for short) is the problem of constructing, given a specification S, a Mealy machine M such that Lω(M) ⊆ S if it exists. Such a machine M is said to realize the specification S, also written M |= S. We also say that S is realizable if some Mealy machine M realizes it. The induced decision problem is called the realizability problem.

It is well-known that if S is ω-regular (recognizable by, e.g., a parity automaton [33]) the realizability problem is decidable [1] and moreover, a Mealy machine realizing the specification can be effectively constructed. The realizability problem is 2ExpTime-Complete if S is given as an LTL formula [27] and ExpTime-Complete if S is given as a universal coB¨uchi automaton.

Theorem 1 ([7]). The realizability problem for a specification S given as a universal coB¨uchi automaton A is ExpTime-C. Moreover, if S is realizable and A has n states, then S is realizable by a Mealy machine with 2 O(nlog2n) states.

We generalize this result to the following realizability problem which we describe first informally. Given a specification S and a preMealy machine P, the goal is to decide whether P can be completed into a Mealy machine which realizes S. We now define this problem formally. Given two preMealy machines P1,P2, we write P<sup>1</sup> P<sup>2</sup> if P<sup>1</sup> is a subgraph of P<sup>2</sup> in the following sense: there exists an injective mapping Φ from the states of P<sup>1</sup> to the states of P<sup>2</sup> which preserves the initial state (s<sup>0</sup> is the initial state of P<sup>1</sup> iff Φ(s0) is the initial state of P2) and the transitions (∆<sup>P</sup><sup>1</sup> (p, i) = (o, q) iff ∆<sup>P</sup><sup>2</sup> (Φ(p), i) = (o, Φ(q)). As a consequence, L(P1) ⊆ L(P2) and Lω(P1) ⊆ Lω(P2). Given a preMealy machine P, we say that a specification S is P-realizable if there exists a Mealy machine M such that P M and M realizes S. Note that if P is a (complete) Mealy machine, S is P-realizable iff P realizes S. The next result is proved in [5]:

Theorem 2. Given a universal co-B¨uchi automaton A with n states defining a specification S = L ∀ (A) and a preMealy machine P with m states and n<sup>h</sup> holes, deciding whether S is P-realizable is ExpTime-hard and in ExpTime (in n and polynomial in m). Moreover, if S is P-realizable, it is P-realizable by a Mealy machine with m + nh2 O(nlog2n) states. Hardness holds even if P has two states and A is a deterministic reachability automaton.

### 3 Synthesis from safety specifications and examples

In this section, we present the learning framework we use to synthesise Mealy machines from examples, and safety specifications. Its generalization to any ωregular specification is described in Sec. 4 and solved by reduction to safety specifications. It is a two-phase algorithm: (1) it generalizes the examples while maintaining realizability of the specification, and outputs a preMealy machine, (2) it completes the preMealy machine into a full Mealy machine.

Phase 1: Generalizing the examples This phase exploits the examples by generalizing them as much as possible while maintaining realizability of the specification. It outputs a preMealy machine which is consistent with the examples and realizes the specification, if it exists. It is an RPNI-like learning algorithm [21,20] which includes specific tests to maintain realizability of the specification. In particular, it first builds a tree-shaped preMealy machine whose accepted language is exactly the set of prefixes Prefs(E) of the given set of examples E, called a prefix-tree acceptor (PTA). Then, it tries to merge as many as possible states of the PTA. The strategy used to select a state to merge another given state with, is a parameter of the algorithm, and is called a merging strategy σG. Formally, a merging strategy σ<sup>G</sup> is defined over 4-tuples (M, m, E, X) where M is a preMealy machine, m is a state of M, E is a set of examples and X is subset of states of M (the candidate states to merge m with), and returns a state of X, i.e., σG(M, m, E, X) ∈ X.

The pseudo-code is given by alg. 1. Initially, it tests whether the set of examples E is consistent1and if yes, checks if PTA(E) can be completed into a Mealy machine realizing the given specification S, thanks to Thm. 2. If that is the case, then it takes all prefixes of E as the set of examples, and enters a loop which consists in iteratively coarsening again and again some congruence ∼ over the states of PTA(E), by merging some of its classes. The congruence ∼ is initially the finest equivalence relation. It does the coarsening in a specific order: examples (which are states of PTA(E)) are taken in length-lexicographic order. When entering the loop with example e, the algorithm computes at line 4 all the states, i.e., all the examples e <sup>0</sup> which have been processed already by the loop (e <sup>0</sup> ≺ll e) and whose current class can be merged with the class of e (predicate Mergeable(PTA(E), ∼, e, e<sup>0</sup> )). State merging is a standard operation in automata learning algorithms which intuitively means that merging the ∼-class of e and the ∼-class of e 0 , and propagating this merge to the descendants of e and e 0 , does not result any conflict. The formal definition is in [5]. At line 5, it filters the previous set by keeping only the states which, when merged with e, produce a preMealy

<sup>1</sup> E is consistent if outputs uniquely depends on prefixes. Formally, it means for all prefixes u ∈ Prefs(E) ∩ (IO) ∗ I, there is a unique output o ∈ O s.t. uo ∈ Prefs(E).

machine which can be completed into a Mealy machine realizing S (again by Thm. 2). If after the filtering there are still several candidates for merge, one of them is selected with the merging strategy σ<sup>G</sup> and the equivalence relation is then coarsened via class merging (operation MergeClass(PTA(E), ∼, e, e<sup>0</sup> )). At the end, the algorithm returns the quotient of PTA(E) by the computed Mealycongruence. As a side remark, when S is universal, i.e. S = (IO) <sup>ω</sup>, then it is realizable by any Mealy machine and therefore line 5 does not filter any of the candidates for merge. So, when S is universal, Algo 1 can be seen as an RPNI variant for learning preMealy machines.


Phase 2: completion of preMealy machines into Mealy machines As it only constructs the PTA and tries to merge its states, the generalization phase might not return a (complete) Mealy machine. In other words, the machine it returns might still contain some holes (missing transitions). The objective of this second phase is to complete those holes into a Mealy machine, while realizing the specification. More precisely, when a transition is not defined from some state m and some input i ∈ I, the algorithm must select an output symbol o ∈ O and a state m<sup>0</sup> to transition to, which can be either an existing state or a new state to be created (in that case, we write m<sup>0</sup> = fresh to denote the fact that m<sup>0</sup> is a fresh state). In our implementation, if it is possible to reuse a state m<sup>0</sup> that was created during the generalization phase, it is favoured over other states, in order to exploit the examples. However, the algorithm for the completion phase we describe now does not depend on any particular strategy to pick states. Therefore, it is parameterized by a completion strategy σ<sup>C</sup> , defined over all triples (M, m, i, X) where M is a preMealy machine with set of states M, (m, i) is a hole of M, and X ⊆ O × (M ∪ {fresh}) is a list of candidate pairs (o, m<sup>0</sup> ). It returns an element of X, i.e., σ<sup>C</sup> (M, m, i, X) ∈ X.

In addition to σ<sup>C</sup> , the completion algorithm takes as input a preMealy machine M<sup>0</sup> and a specification S, and outputs a Mealy machine which M0-realizes S, if it exists. The pseudo-code is given in Algo 2. Initially, it tests whether S

is M0-realizable, otherwise it returns UNREAL. Then, it keeps on completing holes of M0. The computation of the list of output/state candidates is done at the loop of line 5. Note that the for-loop iterates over M ∪ {fresh()}, where fresh() is a procedure that returns a fresh state not in M. The algorithm maintains the invariant that at any iteration of the while-loop, S is M-realizable, thanks to the test at line 7, based on Thm. 2. Therefore, the list of candidates is necessarily non-empty. Amongst those candidates, a single one is selected and the transition on (m, i) is added to M accordingly at line 10.

```
Algorithm 2: Comp(M0,S,σC ): preMealy machine completion algo-
 rithm
   Input: A preMealy machine M0 = (M, minit, ∆), a specification S ⊆ (I.O)
                                                                      ∗
          given as a deterministic safety automaton, a completion strategy σC
   Output: A (complete) Mealy machine M such that S is M0-realizable,
            otherwise UNREAL.
 1 if S is not M0-realizable then return UNREAL
 2 M ← M0
 3 while there exists a hole (m, i) ∈ M × I do
 4 candidates ← ∅
 5 for (o, m0
               ) ∈ O × (M ∪ {fresh()}) do
                                 // fresh() denotes a new state not in M
 6 Mo,m0 ← (M ∪ {m0
                           }, minit, ∆ ∪ {(m, i) 7→ (o, m0
                                                    )})
 7 if S is Mo,m0 -realizable then
 8 candidates ← candidates ∪ {(o, m0
                                            )}
 9 (o, m0
           ) ← σC (M, m, i, candidates)
10 (M, ∆) ← (M ∪ {m0
                        }, ∆ ∪ {(m, i) 7→ (o, m0
                                             )})
11 M ← (M, minit, ∆)
12 return M
```
Two-phase synthesis algorithm from specifications and examples The two-phase synthesis algorithm for safety specifications and examples, called Synth-Safe(E, S, σG, σ<sup>C</sup> ) works as follows: it takes as input a set of examples E, a specification S given as a deterministic safety automaton, a generalizing and completion strategies σG, σ<sup>C</sup> respectively. It returns a Mealy machine M which realizes S and such that E ⊆ L(M) if it exists. In a first steps, it calls Gen(E, S, σG). If this calls returns UNREAL, then SynthSafe return UNREAL as well. Otherwise, the call to Gen returns a preMealy machine M0. In a second step, Synth-Safe calls Comp(M0, S, σ<sup>C</sup> ). If this call returns UNREAL, so does Synth-Safe, otherwise SynthSafe returns the Mealy machine computed by Comp. The pseudo-code of SynthSafe can be found in [5].

The completion procedure may not terminate for some completion strategies. It is because the completion strategy could for instance keep on selecting pairs of the form (o, m<sup>0</sup> ) where m<sup>0</sup> is a fresh state. However we prove that it always terminates for lazy completion strategies. A completion strategy σ<sup>C</sup> is said to be lazy if it favours existing states, which formally means that if X \ (O × {fresh}) 6= ∅, then σ<sup>C</sup> (M, m, i, X) 6∈ O × {fresh}. The 1st theorem states correctness and termination of the algorithm for lazy completion strategies (assuming the functions σ<sup>G</sup> and σ<sup>C</sup> are computable in worst-case exptime in the size of their inputs).

Theorem 3 (termination and correctness). For all finite sets of examples E ⊆ (I.O) ∗ , all specifications S ⊆ (I.O) <sup>ω</sup> given as a deterministic safety automaton A with n states, all merging strategies σ<sup>G</sup> and all completion strategies σ<sup>C</sup> , if SynthSafe(E, S, σG, σ<sup>C</sup> ) terminates then, it returns a Mealy machine M such that E ⊆ L(M) and M realizes S, if it exists, otherwise it returns UNREAL. Moreover, SynthSafe(E, S, σG, σ<sup>C</sup> ) terminates if σ<sup>C</sup> is lazy, in worst-case exponential time (polynomial in the size<sup>2</sup> of E and exponential in n).

The proof of the latter theorem is a consequence of several results proved on the generalization and completion phases, and is given in [5].

A Mealy machine T is minimal if for all Mealy machine M such that L(T ) = L(M), the number of states of M is at least that of T . The next result, proved in [5], states that any minimal Mealy machine realizing a specification S can be returned by our synthesis algorithm, providing representative examples.

Theorem 4 (Mealy completeness). For all specifications S ⊆ (I.O) <sup>ω</sup> given as a deterministic safety automaton, for all minimal Mealy machines M realizing S, there exists a finite set of examples E ⊆ (I.O) ∗ , of size polynomial in the size of M, such that for all generalizing strategies σ<sup>G</sup> and completion strategies σ<sup>C</sup> , and all sets of examples E<sup>0</sup> s.t. E ⊆ E<sup>0</sup> ⊆ L(M), SynthSafe(E<sup>0</sup> , S, σG, σ<sup>C</sup> ) = M.

The polynomial upper bound given in the statement of Theorem 4 is more precisely the following: the cardinality of E is O(m + n 2 ) where n is the number of states of M while m is its number of transitions. Moreover, each example e ∈ E has length O(n 2 ). More details can be found in Remark 1 of [5].

#### 4 Synthesis from ω-regular specifications and examples

We now consider the case where the specification S is given as universal coB¨uchi automaton, in Section 4. We consider this class of specifications as it is complete for ω-regular languages and allow for compact symbolic representations. Further in this section, we consider the case of LTL specifications.

Specifications given as universal coB¨uchi automata Our solution for ωregular specifications relies on a reduction to the safety case treated in Sec. 3. It relies on previous works that develop so called Safraless algorithms for ω-regular reactive synthesis [24,29,18]. The main idea is to strengthen the (safety) acceptance condition of the automaton from coB¨uchi to K-coB¨uchi. It is complete for the plain synthesis problem (w/o examples) if K is large enough (in the worst-case exponential in the number of states of the automaton (e.g., see [18])). Moreover, it allows for incremental synthesis algorithms: if the specification defined by the automaton with a k-coB¨uchi acceptance condition is realizable, for k ≤ K, so is the specification defined by taking K-coB¨uchi acceptance. Here, as we also take examples into account, we need to slightly adapt the results. The next theorem is proved in [5] while the next lemma is immediate:

<sup>2</sup> The size of E is the sum of the lengths of the examples of E.

Theorem 5. Given a universal co-B¨uchi automaton A with n states defining a specification S = L ∀ (A) and a preMealy machine P with m states, we have that S is P-realizable iff S <sup>0</sup> = L ∀ <sup>K</sup>(A) is P-realizable for K = nm|I|2 O(n log<sup>2</sup> n) .

Lemma 1. For all co-B¨uchi automata A, for all preMealy machines P, for all k<sup>1</sup> ≤ k2, we have that L ∀ k<sup>1</sup> (A) ⊆ L ∀ k<sup>2</sup> (A) and so if L ∀ k<sup>1</sup> (A) is P-realizable then L ∀ k<sup>2</sup> (A) is P-realizable. Furthermore for all k ≥ 0, if S <sup>0</sup> = L ∀ k (A) is P-realizable then S = L ∀ (A) is P-realizable.

Thanks to the latter two results applied to P = PTA(E) for a set E of examples of size m, we can design an algorithm for synthesising Mealy machines from a specification defined by a universal coB¨uchi automaton A with n states and E: it calls SynthSafe on the safety specification L ∀ k (A) and E for increasing values of k, until it concludes positively, or reach the bound K = 2O(mn log<sup>2</sup> mn)+ 1. In the latter case, it returns UNREAL. However, to apply SynthSafe properly, L ∀ k (A) must be represented by a deterministic safety automaton. This is possible as k-coB¨uchi automata are determinizable [18].

Determinization The determinization of k-co-B¨uchi automata A relies on a simple generalization of the subset construction: in addition to remembering the set of states that can be reached by a prefix of a run while reading an infinite word, the construction counts the maximal number of times a run prefix that reaches a given state q has visited states labelled with color 1 (remember that a run can visit at most k such states to be accepting). The states of the deterministic automaton are so-called counting functions, formally defined for a co-B¨uchi automaton A = (Q, qinit, Σ, δ, d) and k ∈ N, as the set noted CF(A, k) of functions f : Q → {−1, 0, 1, . . . , k, k + 1}. If f(q) = −1 for some state q, it means that q is inactive (no run of A reach q on the current prefix). The initial counting function finit maps all 1-colored initial states to 1, all 0-colored initial states to 0 and all other states to −1. We denote by D(A, k) = (Q<sup>D</sup> = CF(A, k), q<sup>D</sup> init = finit, Σ, δ<sup>D</sup>, Q<sup>D</sup> usf) the deterministic automaton obtained by this determinization procedure. It is formally defined in [5]. We can now give algorithm SynthLearn, in pseudo-code, as Algo 3.

Complexity considerations and improving the upper-bound As the automaton D(A, k) is in the worst-case exponential in the size of the automaton A, a direct application of Thm. 3 yields a doubly exponential time procedure. This complexity is a consequence of the fact that the P-realizability problem is ExpTime in the size of the deterministic automaton as shown in Thm. 2, and that the termination of the completion procedure is also worst-case exponential in the size of the deterministic automaton.

We show that we can improve the complexity of each call to SynthSafe and obtain an optimal worst-case (single) exponential complexity. We provide an algorithm to check P-realizability of a specification S = L ∀ k (A) that runs in time singly exponential in the size of A and polynomial in k and the size of P. Second, we provide a finer complexity analysis for the termination of the completion algorithm, which exhibits a worst case exponential time in |A|. Those two improvements lead to an overall complexity of SynthLearn which is expoAlgorithm 3: SynthLearn(E,A,σG,σ<sup>C</sup> ) – synthesis algorithm from ω-regular specification and examples by a reduction to safety


nential in the size of the specification A and polynomial in the set of examples E. This is provably worst-case optimal because for E = ∅ the problem is already ExpTime-Complete. We explain next the first improvement, the upper-bound for termination is provided in [5].

Checking P-realizability of a specification S = L ∀ k (A) To obtain a better complexity, we exploit some structure that exists in the deterministic automaton D(A, k). First, the set of counting functions CF(A, k) forms a complete lattice for the partial order defined by f<sup>1</sup> f<sup>2</sup> if f1(q) ≤ f2(q) for all states q. We denote by f<sup>1</sup> F f<sup>2</sup> the least upper-bound of f1, f2, and by W<sup>A</sup> k the set of counting functions f such that the specification L(D(A, k)[f]) is realizable (i.e. the specification defined by D(A, k) with initial state f). It is known that W<sup>A</sup> k is downward-closed for [18], because for all f<sup>1</sup> f2, any machine realizing L(D(A, k)[f2]) also realizes L(D(A, k)[f1]). Therefore, W<sup>A</sup> k can be represented compactly by the antichain dW<sup>A</sup> k e of its -maximal elements. Now, the first improvement is obtained thanks to the following result:

Lemma 2. Given a preMealy P = (M, m0, ∆), a co-B¨uchi automata A, and k ∈ N. For all states m ∈ M, we let F ∗ (m) = F {f | ∃u ∈ (IO) ∗ ·Post<sup>∗</sup> <sup>P</sup> (m0, u) = m ∧ PostD(f0, u) = f}. Then, L(D(A, k)) is P-realizable iff there does not exist m ∈ M such that F ∗ (m) 6∈ W<sup>A</sup> k .

It is easily shown that the operator F ∗ can be computed in pTime. Thus, the latter lemma implies that there is a poly-time algorithm in |P|, |A|, k ∈ N, and the size of dW<sup>A</sup> k e to check the P-realizability of L ∀ (A). Formal details in [5].

We end this subsection by summarizing the behavior of our synthesis algorithm for ω-regular specifications defined as universal co-B¨uchi automata.

Theorem 6. Given a universal coB¨uchi automaton A and a set of examples E, the synthesis algorithm SynthLearn returns, if it exists, a Mealy machine M such that E ⊆ L(M) and Lω(M) ⊆ L ∀ (A), in worst-case exponential time in the size of A and polynomial in the size of E. Otherwise, it returns UNREAL.

Specifications given as an LTL formula We are now in position to apply Alg. 3 to a specification given as LTL formula ϕ. Indeed, thanks to the results of the subsection above, to provide an algorithm for LTL specifications, we only need to translate ϕ into a universal co-B¨uchi automaton. This can be done according to the next lemma. It is well-known (see [24]), that given an LTL formula ϕ over two sets of atomic propositions P<sup>I</sup> and PO, we can construct in exponential time a universal co-B¨uchi automaton A<sup>ϕ</sup> such that L ∀ (Aϕ) = [ϕ], i.e. A recognizes exactly the set of words w ∈ (2P<sup>I</sup> 2 <sup>P</sup><sup>O</sup> ) <sup>ω</sup> that satisfy ϕ. We then get the following theorem that gives the complexity of our synthesis algorithm for a set of examples E and an LTL formula ϕ, complexity which is provably worst-case optimal as deciding if [ϕ] is realizable with E = ∅, i.e. the plain LTL realizability problem, is already 2ExpTime-Complete [27].

Theorem 7. Given an LTL formula ϕ and a set of examples E, the synthesis algorithm SynthLearn returns a Mealy machine M such that E ⊆ L(M) and Lω(M) ⊆ [ϕ] if it exists, in worst-case doubly exponential time in the size of ϕ and polynomial in the size of E. Otherwise it returns UNREAL.

### 5 Implementation and Case study

We have implemented the algorithm SynthLearn of the previous section in a prototype tool, in Python, using the tool Acacia-Bonzai [11] to manipulate antichains of counting functions. We first explain the heuristics we have used to define state-merging and completion strategies, and then demonstrate how our implementation behaves on a case study whose goal is to synthesize the controller for an elevator. The interested reader can find in [5] other case studies, including a controller for an e-bike and two variations on mutual exclusion.

Merging and completion strategies implemented in our prototype Our tool implements a merging strategy σ<sup>G</sup> where, given an example e that leads in the current preMealy machine to a state m and a set {m1, m2, . . . , mk} of candidates for merging, as computed in line 7 of Algorithm 1, we choose state m<sup>i</sup> with a -minimal counting function F ∗ (mi), as defined in Lemma 2. Intuitively, favouring minimal counting functions preserves as much as possible the set of behaviors that are possible after the example e.

Our tool also implements a completion strategy σ<sup>C</sup> , where for every hole (m, i) of the preMealy machine M and out of the list of candidate pairs, selects an element which again favour states associated with -minimal counting functions. For more details, we refer the reader to [5].

Lift Controller Example We illustrate how to use our tool to construct a suitable controller for a two-floor elevator system.

Considering two floors is sufficient enough to illustrate most of the main difficulties of a more general elevator. Inputs of the controller are given by two atomic propositions b0 and b1, which are true whenever the button at floor 0 (resp. floor 1) is pressed by a user. Outputs are given by the atomic propositions f0 and f1, true whenever the elevator is at floor 0 (resp. floor 1); and ser, true whenever the elevator is serving the current floor (i.e. doors are opened). This controller should ensure the following core properties:

Fig. 2: Machine returned by our tool on the elevator specification w/o examples. Here, q0, q1, q2, q<sup>3</sup> represents the states where f0 is served when required, where b1 is pending, where f1 is served, the state where b0 is pending respectively.

Fig. 3: Mealy machine returned by our tool on the elevator specification with additional examples. The preMealy machine obtained after generalizing the examples and before completion is highlighted in red. This took 3.10s to be generated.


Additionally, we make the following assumption: whenever a button of floor 0 (or floor 1) is pressed, it must remain pressed until the floor has been served, i.e., G(b0 -> (b0 W (f0 & ser))) & G(b1 -> (b1 W (f1 & ser))).

Before going into the details of this example, let us explain the methodology that we apply to use our tool on this example. We start by providing only the high level specification ϕCORE for the elevator given above. We obtain a first Mealy machine from the tool. We then observe the machine to identify prefix of behaviours that we are unhappy with, and for which we can provide better alternative decisions. Then we run the tool on ϕCORE and the examples that we have identified, and we get a new machine, and we proceed like that up to a point where we are satisfied with the synthesized Mealy machine.

Let us now give details. When our tool is provided with this specification without any examples, we get the machine depicted in fig. 2. This solution makes the controller switch between floor 0 and floor 1, sometimes unnecessarily. For instance, consider the trace s # {!b0 & !b1}{!f0 & f1 & !ser} # {!b0 & !b1}{f0 & !f1 & !ser}, where we let s = {!b0 & b1}{f0 & !f1 & !ser} # {!b0 & b1}{!f0 & f1 & ser}. Here, we note that the transition goes back to state q0, where the elevator is at floor 0, when the elevator could have remained at floor 1 after serving floor 1. The methodology described above allows us to identify the following three examples:


With those additional examples, our tool outputs the machine of fig. 3, which generalizes them and now ensures that moves of the elevator occur only when required. For example, the end of the first trace has been generalized into a loop on state q<sup>1</sup> ensuring that the elevator does not go to floor 0 from floor 1 unless b0 is pressed. We note that the number of examples provided here is much smaller than the theoretical (polynomial) upper bound proved in Theorem 4.

#### 6 Conclusion

We have introduced synthesis with a few hints, which allows the user to guide synthesis using examples of expected executions of high quality solutions. Existing synthesis tools may provide unnatural solutions when fed with high-level specifications only. As providing complete specifications goes against the very goal of synthesis, we believe our algorithm has a greater potential in practice.

We have studied the computational complexity of problems that need to be solved during our synthesis procedure. We have proved our algorithm is complete: any Mealy machine M realizing a specification ϕ can be obtained from ϕ and a representative example set E, whose size is bounded polynomially in the size of M. We have implemented our algorithm in a prototype tool that extends Acacia-Bonzai [11] with tailored state-merging learning algorithms. We have shown that only a small number of examples are necessary to obtain high quality machines from high-level LTL specifications only. The tool is not fully optimized yet. While this is sufficient to demonstrate the relevance of our approach, we will work on efficiency aspects of the implementation.

As future works, we will consider extensions of the user interface to interactively and concisely specify sets of (counter-)examples to solutions output by the tool. In the same line, an interesting future direction is to handle parametric examples (e.g. elevator with the number of floors given as parameter). This would require to provide a concise syntax to define parametric examples and to design efficient synthesis algorithm in this setting. We will also consider the possibility to formulate negative examples, as our theoretical results readily extend to this case and their integration in the implementation should be easy.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Timed Automata Verification and Synthesis via Finite Automata Learning?

Ocan Sankur()

Univ Rennes, Inria, CNRS, Rennes, France ocan.sankur@cnrs.fr

Abstract. We present algorithms for model checking and controller synthesis of timed automata, seeing a timed automaton model as a parallel composition of a large finite-state machine and a relatively smaller timed automaton, and using compositional reasoning on this composition. We use automata learning algorithms to learn finite automata approximations of the timed automaton component, in order to reduce the problem at hand to finite-state model checking or to finite-state controller synthesis. We present an experimental evaluation of our approach.

### 1 Introduction

Timed automata [1] are a well-known formalism for modeling and verifying realtime systems. They can be used to model systems as finite automata, while using, in addition, clocks to impose timing constraints on the transitions. Using clock variables have advantages. They allow one to describe models that are expressive thanks to real-valued clock values; moreover, the use of specific clock variables enable optimizations such as sound and complete abstractions, also known as extrapolation operators [5]. Model checking algorithms have been developed and implemented in tools such as Uppaal [8], TChecker [28], PAT [50].

One approach for model checking timed automata is based on representing the set of clock values with zones, which are particular polyhedra, and using explicit enumeration on the discrete states. There has been extensive research on sound and complete abstractions on zones, which improved the performance of the model checking tools, and made it possible to handle models with more complex time constraints; see [11] for a survey. However this approach does not scale to models with large discrete spaces due to explicit enumeration. Several authors have developed algorithms to remedy this issue, and to attempt to adapt efficient model checking techniques finite-state systems to timed systems. Extensions of binary decision diagrams (BDD) with clock constraints have been considered both for continuous time [53,10,23] and discrete time [42,51]. Another approach is to use predicate abstraction on clock variables that enables efficient finite-state verification techniques based on BDDs or SAT solvers [17,16,46].

Controller synthesis is a related problem in which some transitions of the system are controllable and some are uncontrollable, and the objective is to

<sup>?</sup> This work was partially funded by ANR project Ticktac (ANR-18-CE40-0015).

compute a control strategy which guarantees that all induced runs of the system satisfy a given specification; see e.g. [52]. This problem is formalized using games, and in the case of real-time systems, using timed games [39,4]. Zone-based algorithms have been developed to solve timed games and compute control strategies [14], and are available in the Uppaal TIGA tool [7]. These algorithms suffer from the same limitations as the zone-based model checking algorithms. Although they can be efficient on instances with small discrete state spaces, they do not scale well to large systems. An attempt was made to implement the counter-example guided abstraction refinement scheme to handle larger discrete state space in timed games in [44]. On the other hand, there are several efficient finite-state game solvers, based on BDDs and SAT solvers, which can efficiently handle relatively large state spaces [31], but cannot handle real time.

In this work, we introduce an approach that is applied both to model checking and controller synthesis of timed automata with the objective of combining the advantages of both timed automata and finite-state model checkers and game solvers. Our suggestion is to see the input model, without loss of generality, as a parallel composition between a finite-state machine A, and a timed automaton T . We specifically target instances where A is large, and T is relatively small but nontrivial. Note that this point of view was considered before in the verification of synchronous systems within a real-time environment [9]. As a novelty, for model checking, we apply a compositional reasoning rule on the product AkT by replacing the timed automaton T by a (small) deterministic finite automaton (DFA) H which represents the behaviors of T . To automatically select the DFA H, we adapt the algorithm [43] to our setting, and use a DFA learning algorithm (such as L\* [3], or TTT [29]) to find an appropriate DFA either to prove the specification or to reveal a counterexample.

Our approach enjoys the principle of separation of concerns in the following sense. A timed automaton model checker is used by the learning algorithm to answer membership and equivalence queries (see Section 2.2); these are answered without referring to A, thus, by avoiding the large discrete state space. Therefore, the timed automaton model checker is used in this approach for what it is designed for: handling real-time constraints encoded in T , not for dealing with excessive discrete state spaces. Once an appropriate DFA H is found by the learning algorithm, the system AkH is model-checked using a finite-state model checker whose focus is to deal with large discrete state spaces. We can thus benefit from the best of the two worlds: a state-of-the-art model checker for timed automata, which is somewhat used here as a theory solver, and any finite-state model checker based on BDDs, SAT solvers, or even explicit-state enumeration.

The application of the learning-based compositional reasoning of [43] to controller synthesis is more involved. Our objective was to find a way to exploit efficient finite-state game solvers [31] in the context of timed automata even if this meant having an incomplete algorithm. We describe a setting where a one-sided abstraction is applied for controller synthesis by replacing the timed automaton component by a learned DFA. Contrarily to the model checking algorithm, our controller synthesis algorithm is sound but not complete, that is, the algorithm may fail although there exists a control strategy, while any control strategy that is output is correct. More precisely, we consider timed games in the form GkT where G is a finite-state game, and T is a timed automaton. We describe an algorithm that alternates between two phases. In the first phase, the goal is to find a DFA H that is an overapproximation of T . Once this is found, we use a finite-state game solver on GkH; if there is a control strategy, we prove that it can be applied in the original system GkT . If not, then we obtain a counterstrategy S. We then switch to the second phase whose goal is to check whether the counterstrategy is spurious or not; and it does so by learning an underapproximation DFA H of T , and checking whether S induces runs that are all in H. Accordingly, we either reject the instance or switch back to the first phase. As in the model checking algorithm, the timed automaton model checker is only used to answer queries independently from G, and a finite-state game solver and a model checker are used to compute and analyze strategies in a discrete state-space.

To the best of our knowledge, apart from [44], we present the first algorithm that can solve timed games with large discrete state spaces. Although the algorithm applies to a subset of timed games and is not complete, we believe it is of utmost importance to make progress on the scalability of timed game solvers in order for these methods to be applied in convincing applications. Our paper makes an attempt in this direction.

We evaluate our algorithms in comparison with state-of-the-art tools and show that our approach is competitive with the existing tools, and can allow both model checking and synthesis to scale to larger models. The approach offers an alternative treatment of timed models, which might be applied in other settings.

We present the model checking algorithm in Section 2 which contains formal definitions, the description of the algorithm, and the experiments. Section 3 presents our contributions on the controller synthesis problem, and includes formal definitions, the description of the algorithm, and the experiments. In Section 4, we provide a broader discussion on related works, and present our conclusions and perspectives.

#### 2 Compositional Model Checking

#### 2.1 Preliminaries

Labeled Transition Systems and Finite Automata. We denote finite labeled transition systems (LTS) as tuples (Q, q<sup>0</sup> , Σ, T) where Q is the set of states, q <sup>0</sup> ∈ Q is the initial state, Σ is a finite alphabet, T ⊆ Q×Σ ∪ {}×Q is the transition relation ( labels silent transitions). Because we will consider synchronous product of LTSs, we will use silent transitions to define internal transitions not exposed for synchronization. A finite automaton is an LTS given with a set of accepting states F ⊆ Q, and is written (Q, q<sup>0</sup> , Σ, T, F). A run of an automaton is a sequence q1e1q2e<sup>2</sup> . . . q<sup>n</sup> where q<sup>1</sup> = q 0 , e<sup>i</sup> = (q<sup>i</sup> , σ<sup>i</sup> , qi+1) ∈ T for some σ<sup>i</sup> ∈ Σ ∪ {} for each 1 ≤ i ≤ n − 1. The trace of the run is the sequence σ1σ<sup>2</sup> . . . σn−1. An

accepting run starts at q <sup>0</sup> and ends in F. The language of a finite automaton A is the set of the traces of all accepting runs of A, and is denoted by L(A). We will consider deterministic finite automata (DFA) which do not have silent transitions, and have at most one edge for each label from each state.

The parallel composition of two automata A<sup>i</sup> = (Q<sup>i</sup> , q<sup>0</sup> i , Σ, T<sup>i</sup> , Fi), i ∈ {1, 2}, defined on the same alphabet, is the automaton A<sup>1</sup> k A<sup>2</sup> = (Q, q<sup>0</sup> , Σ, T, F) with Q = Q1×Q2, q <sup>0</sup> = (q 0 1 , q<sup>0</sup> 2 ), F = F1×F2, and T contains ((q1, q2), σ,(q 0 1 , q<sup>0</sup> 2 )) for all (q1, σ, q<sup>0</sup> 1 ) ∈ T1, and (q2, σ, q<sup>0</sup> 2 ) ∈ T2; and ((q1, q2), ,(q 0 1 , q2)) for all (q1, , q<sup>0</sup> 1 ) ∈ T1, and q<sup>2</sup> ∈ Q2; and symmetrically, ((q1, q2), ,(q1, q<sup>0</sup> 2 )) for all (q2, , q<sup>0</sup> 2 ) ∈ T2, and q<sup>1</sup> ∈ Q1.

Finite Automata Learning. We use finite automata learning algorithms such as L ∗ [3,45] and TTT [29]. In the online learning model, the learning algorithm interacts with a teacher in order to learn a deterministic finite automaton recognizing a hidden regular language known to the teacher. The algorithm can make two types of queries. A membership query consists in asking whether a given word belongs to the language, to which the teacher answers by yes or no. An equivalence query consists in creating a hypothesis automaton H, and asking the teacher whether H recognizes the language. The teacher either answers yes, or no and provides a counterexample word which is in the symmetric difference of L(H) and of the target language. Learning algorithms typically make a large number of membership queries, and a smaller number of equivalence queries.

Timed Automata. We fix a finite set of clocks C. Clock valuations are the elements of R C ≥0 . For R ⊆ C and a valuation v, v[R ← 0] is the valuation defined by v[R ← 0](x) = v(x) for x ∈ C \ R and v[R ← 0](x) = 0 for x ∈ R. Given d ∈ R<sup>≥</sup><sup>0</sup> and a valuation v, v + d is defined by (v + d)(x) = v(x) + d for all x ∈ C. We extend these operations to sets of valuations in the standard way. We write 0 for the valuation that assigns 0 to every clock.

We consider a clock named 0 which has the constant value 0, and let C<sup>0</sup> = C ∪ {0}. An atomic guard is a formula of the form x ./ k, or x − y ./ k where x, y ∈ C0, k, l ∈ N, and ./ ∈ {<, ≤, >, ≥}. A guard is a conjunction of atomic guards. A valuation v satisfies a guard g, denoted v |= g, if all atomic guards are satisfied when each x ∈ C is replaced by v(x). Let Φ<sup>C</sup> denote the set of guards for C.

A timed automaton T is a tuple (L, `0, Σ,Inv, C, E, F), where L is a finite set of locations, `<sup>0</sup> ∈ L is the initial location, Σ is the alphabet, Inv : L → Φ<sup>C</sup> the invariants, C is a finite set of clocks, E ⊆ L×Σ×ΦC×2 <sup>C</sup>×L is a set of edges. An edge e = (`, g, σ, R, `<sup>0</sup> ) is also written as ` g,σ,R −−−→ ` 0 . F ⊆ L is the set of accepting locations.

A run of T is a sequence r = q1e1q2e<sup>2</sup> . . . q<sup>n</sup> where q<sup>i</sup> ∈ L×R C ≥0 , q<sup>1</sup> = (`0, 0), and writing q<sup>i</sup> = (`, v) for each 1 ≤ i ≤ n, we have v ∈ Inv(`). If i < n, then either e<sup>i</sup> ∈ R><sup>0</sup> and v + e<sup>i</sup> ∈ Inv(`), in which case qi+1 = (`, v + ei), or e<sup>i</sup> = (`, g, σ, R, `<sup>0</sup> ) ∈ E, in which case v |= g and qi+1 = (` 0 , v[R ← 0]). The run is accepting if the last location is in F. The trace of the run r is the word σ0σ<sup>1</sup> . . . σ<sup>n</sup> where σ<sup>i</sup> is the label of edge e<sup>i</sup> .

The untimed language of the timed automaton T is the set the traces of the accepting runs of T , and is denoted by L(T ).

A timed automaton is label-deterministic if at each location `, for each label σ ∈ Σ, there is at most one transition leaving ` labelled by σ; in other terms, the finite automaton obtained by removing all clocks is deterministic.

We consider the parallel composition of a finite automaton A = (Q, q<sup>0</sup> , Σ, T, F) and a timed automaton T = (L, `0, Σ,Inv, C, E, F<sup>T</sup> ) which is a new timed automaton. Intuitively, a transition labeled by σ consists in an arbitrary number of silent transitions of A, followed by a joint σ-transition of both components. The guard and the reset of the overall transition are those of the transition of T . Formally, let AkT = (L 0 , `0 0 , Σ,Inv<sup>0</sup> , C, E<sup>0</sup> , F<sup>0</sup> ) with L <sup>0</sup> = Q×L, Inv<sup>0</sup> : (q, `) 7→ Inv(`), ` 0 <sup>0</sup> = (q0, `0), and E<sup>0</sup> contains all edges of the form ((q, `), g, σ, R,(q 0 , `0 )) such that (`, g, σ, R, `<sup>0</sup> ) ∈ E, and there exists a sequence q = q0, q1, . . . , qk, qk+1 = q <sup>0</sup> of states of A such that (q0, , q1), . . . ,(qk−1, , qk),(qk, σ, qk+1) are transitions of A. We let F <sup>0</sup> = F×F<sup>T</sup> .

It follows from the definition of the parallel composition that L(AkT ) = L(A) ∩ L(T ).

Target Timed Automata Instances. Our main motivation is to consider real-time systems that are modeled naturally as AkT . Typically, A has a large (discrete) state space, and T is a relatively small timed automaton, but with potentially complex time constraints involving several clocks.

It should be clear however that any timed automaton T can be seen as such a product as follows. Let A be a finite automaton identical to T except that guards and resets are removed; and for each pair of guard g and reset r, a fresh label σg,r is defined and added to each edge with the said guard and reset. Now, define the timed automaton T <sup>0</sup> as a single state with the same clocks as T , with one self-loop for each pair (g, r): such an edge is labeled by σg,r, has guard g, and reset r. We have that T is isomorphic to AkT <sup>0</sup> .

An example is given in Figure 1 which shows how a simple scheduling setting can be modeled in this way. Here, the finite automaton is simple and only stores the mapping from machines to the tasks they are executing. Typically, if the machines or the processes executing tasks have internal states, these could be modeled in A as well without altering the timed automaton.

#### 2.2 Learning-Based Compositional Model Checking Algorithm

We present an algorithm for model checking the untimed language L(AkT ).

Although it is known that the untimed language is regular [1], the size of the corresponding finite automaton can be exponential so a direct computation is not efficient. We will be looking for a finite automaton H which is an overapproximation of T i.e. L(T ) ⊆ L(H). H stands for hypothesis made by the learning algorithm. We will in fact use the following lemma.

Lemma 1. For all finite automata A and H, and timed automata T on common alphabet Σ, if L(T ) ⊆ L(H), then L(AkT ) ⊆ L(A k H).

Finite automaton A:

Timed automaton T :

Fig. 1. Timed automaton AkT modeling a simple scheduling policy. The finite automaton A is given above and models a scheduler which schedules tasks (0 and 1) immediately when they become ready (ready[0] and ready[1]) on machines M<sup>0</sup> and M1, using M<sup>0</sup> first if it is available. The timed automaton T is below, here, as a network of the timed automata, and models interarrival and computation times for each task.

In other terms, by replacing the timed automaton T by its overapproximation, we obtain an overapproximation of the compound system in terms of untimed language. So if a linear property can be established on AkH for an appropriate H, then the property also holds on the original system.

Let us present the above property as a verification rule. Assuming that we want to establish AkT ⊆ Spec for some language Spec, we have

$$\frac{\mathcal{L}(\mathcal{T}) \subseteq \mathcal{L}(H) \quad \mathcal{L}(\mathcal{A} \| H) \subseteq \mathsf{Spec}}{\mathcal{L}(\mathcal{A} \| \mathcal{T}) \subseteq \mathsf{Spec}} \text{ } \text{Asym} \tag{1}$$

Here, H serves as an assumption we make on T when verifying A; so as in Lemma 1, we can use H instead of T during model checking. The rule (1) is well known as the assume-guarantee verification rule [19], and has been used in model checking finite-state systems as well as timed automata [35]. The assumption H can either be provided by the user, or automatically computed using automata learning as in [43]. Intuitively, the model checking algorithm we present in this section is an application of [43] to our specific case.

Figure 2 presents the overview of the algorithm. The membership queries of the learning algorithm are answered by the membership oracle; the equivalence query with conjecture H is answered by the inclusion oracle. When the conjecture H

passes the inclusion check, we model-check HkA. When this is successful, we stop and declare that the original system AkT satisfies the specification. Otherwise, a counterexample w ∈ L(AkH) \ Spec was found, and we use a realizability check to see whether w ∈ L(T ) (this is actually done by the membership oracle). If the answer is yes, then the counterexample is confirmed, and we stop. Otherwise, we inform the learning algorithm that w must be excluded, and continue the learning process.

Note that this algorithm can be used for any regular language specification Spec. We focus on safety properties in our experiments, presented next.

Fig. 2. The learning-based compositional model checking algorithm. The box on the left is a DFA learning algorithm, while the oracles answering the queries of the learning algorithm are shown on the right and correspond to the teacher.

#### 2.3 Experiments

We built a prototype implementation of our algorithm in Scala, using the TTT automata learning algorithm [29] from the learnlib library [30], and the associated automatalib for manipulating finite automata<sup>1</sup> . We used the TChecker [28] model checker for implementing membership and inclusion oracles. For the latter, we complement H into H<sup>c</sup> , and check the emptiness of the parallel composition of T with H<sup>c</sup> . We use the NuSMV model checker for finite-state model checking.

The overall input consists in an SMV file describing A, and of a TChecker timed automaton describing T . We use define expressions in SMV to define the labels Σ, while TChecker allows us to tag each transition with a label.

<sup>1</sup> https://github.com/osankur/compRTMC/releases/tag/tacas23

Table 1. Model checking benchmarks. The column #Clk is the number of clocks; #C is the number of conjectures made by the DFA learning algorithm; #M is the number of membership queries; and |DFA| is the size of the final finite automaton learned. The safety specification holds on all models but those marked with \*. In each cell, — means out of memory (8GB), and - means time out (30 minutes).


We compare our algorithm on a set of benchmarks with the model checkers Uppaal [8] and nuXmv which has a timed automata model checker [16]. The former implements a zone-based enumerative algorithm, while the latter uses predicate abstraction through IC3IA. We describe some of the benchmarks here.

The leader election protocol is a distributed protocol that can recover from crashes [22], extended here with periodic activation times and crash durations. The first four rows of Table 1 correspond to the case where one of the processes crashes when its internal state enters an error state. Internal states are modeled using Boolean circuits from from the synthesis competition (SYNTCOMP) benchmarks. The stateless version is more abstract: there is no internal state model, and crashes can occur at any time. The letters A, B, C, D indicate different timed automaton models. Uppaal was more efficient at solving the stateless version but failed in the full version due to the large discrete state space. The compositional algorithm was effective in verifying all instances but the D case which required a large finite automaton to be learned. One can notice an overhead of the compositional algorithm in the stateless version due to the computation of the finite automaton H. This was particularly an issue in the stateless D case where Uppaal could find a counterexample trace faster; nuXmv was not able to solve these instances.

The flooding time synchronization protocol (FTSP) is a leader election algorithm for multi-hop wireless sensor networks used for clock synchronization [40], and has been the subject of formal verification before [41,34]. We consider the abstract model used in [48] for parameterized verification allowing one to verify the model for a large number of topologies. Our algorithm was faster for the model with 3 processes, although none of the tools scaled to 4 processes.

Overall, the experiments show that our algorithm is competitive with the state of the art tools; while it does not improve the performance uniformly on all considered benchmarks, it does allow us to solve instances that are not solvable by other tools, and sometimes to improve performance both compared to a zone-based approach (Uppaal) and SAT-based algorithms (nuXmv).

#### 3 Compositional Controller Synthesis

#### 3.1 Preliminaries

Games. A finite safety game is a pair (G, Bad) where G is an LTS (QE∪˙ Q<sup>C</sup> , q0, Σ, T) with the set of states given as a partition QE∪˙ Q<sup>C</sup> , namely, Environment states (QE), and Controller states (Q<sup>C</sup> ), and Bad ⊆ QE∪˙ Q<sup>C</sup> is an objective. The game is played between two players, namely, Controller and Environment. At each state q ∈ Q<sup>C</sup> , Controller determines the successor by choosing an edge from q, and Environment determines the successor from states q ∈ QE. A strategy for Controller (resp. Environment) maps finite runs of (QE∪˙ Q<sup>C</sup> , q0, Σ, T) ending in Q<sup>C</sup> (resp. QE) to an edge leaving the last state. A pair of strategies, one for each player, induces a unique infinite run from the initial state. A run is winning for Controller if it does not visit Bad; it is winning for Environment otherwise. A winning strategy for Controller is such that for all Environment strategies, the run induced by the two strategies is winning for Controller. Symmetrically, Environment has a winning strategy if for all Controller strategies, the induced run is winning. A strategy is positional (a.k.a. memoryless) if it only depends on the last state of the given run.

The parallel composition of (G, Bad) and a deterministic finite automaton F = (Q<sup>0</sup> , q<sup>0</sup> 0 , Σ, T<sup>0</sup> , F) on alphabet Σ is a new game whose LTS is GkF in which the Controller states are Q<sup>C</sup> ×Q<sup>0</sup> , the Environment states are QE×Q<sup>0</sup> , and the objective is Bad×F (Notice that Controller thus has a safety objective).

Finite games were extended to the real-time setting as timed games [39,4]. A timed game is a timed automaton T = (LE∪˙ L<sup>C</sup> , `0, Σ,Inv, C, E, Bad) with the exception that its edges are labeled by Σ∪{} (and not just by Σ as in the previous section), and the locations are partitioned as LE∪˙ L<sup>C</sup> into Environment locations and Controller locations. The semantics is defined by letting Environment choose the delay and the edge to be taken at locations LE, while Controller choose these from L<sup>C</sup> . Formally, a strategy for Environment (resp. Controller) is a function which associates a run that ends in L<sup>E</sup> (resp. L<sup>C</sup> ) to a pair of delay and an edge enabled from the state reached after the delay. A run is winning for Controller if it does not visit Bad. A Controller (resp. Environment) strategy is winning for objective Bad if for all Environment (resp. Controller) strategies, the induced run from the initial state is winning (resp. not winning) for Controller. A run r is compatible with a strategy S for Controller (resp. Environment) if there exists an Environment (resp. Controller) strategy S<sup>0</sup> such that r is induced by S, S<sup>0</sup> .

The parallel composition of a finite safety game (G, Bad) and a timed automaton T = (L, `0, Σ,Inv, C, E, F) on common alphabet Σ is the timed game GkT where Controller locations are Q<sup>C</sup> ×L, and Environment locations are QE×L.

Positional strategies exist both for reachability and safety objectives in finite and timed games. Both finite and timed games are known to be determined for reachability and safety objectives. For instance, if Controller does not have a winning strategy for the safety objective, then Environment has a strategy ensuring the reachability of Bad [39,4].

Target Timed Game Instances. We consider controller synthesis problems described as timed games in the form of (GkT , Bad×F) where (G, Bad) is a finite safety game, and T is a timed automaton. In addition, we assume that GkT is Controller-silent, defined as follows.

Definition 1. The timed game (GkT , Bad×F) on alphabet Σ is Controller-silent if 1) all Controller transitions are silent; and 2) all Controller locations in T are urgent, that is, an invariant ensures that no time can elapse.

Hence, we again separate the game G defined on a possibly large discrete state space while real-time constraints are separately given in T .

The intuition behind the semantics is the following: because the game is played in GkT and G is Controller-silent, the timed automaton model T is only used to disallow some of the Environment transitions according to real-time constraints, while Controller's actions are instantaneous responses to Environment's actions and thus are unaffected by the constraints of T . One can think of the timed automaton as some form of scheduler that schedules uncontrollable events in the system, so the order of these is determined by Environment. This assumption is restrictive; for instance, this excludes controller synthesis problems where the control strategy is to choose delays to execute some events. Nonetheless, this asymetric view enables a one-sided abstraction framework presented in the next section, where Environment transitions are approximated by a DFA.

An example is given in Figure 3. The finite game drawn here only shows the structure of the game. It has, in addition, integer variables rob x, rob y, obs x, obs y encoding the positions of the robot and of the obstacle, and a Boolean variable door to encode the state of the door. The state e belongs to Environment, which can move the obstacle in any direction, close or open the door, or let the robot move by going to state c. The state c belongs to Controller. All its leaving transitions are silent, and correspond to moving the robot in four directions. These transitions have preconditions, not shown in the figure, that check whether the moves are possible, and have updates that modify the state variables. The timed automaton, given as a network of three timed automata, determine the timings of these events. One can notice, for example, that the

Finite game:

Fig. 3. The sketch of a timed game GkT modelling a planning problem. The finite game models a robot and an obstacle moving in a grid world as shown on top right. The cells r and o show, respectively, the initial positions of the robot and the obstacle. The robot cannot cross walls (shown in thick segments), and can only cross the door if it is open. Here four silent transitions were marked with r, l, u, <sup>d</sup> for readability; in reality, these are all labeled by .

robot is moving faster than the obstacle, and that whenever the door is closed, it remains so for 10 time units.

#### 3.2 One-Sided Abstraction

Thanks to the assumption we make on considered timed games, we show that by replacing T by a DFA H that is an overapproximation, we obtain an abstract game in which Controller strategies can be transferred to the original game. This is formalized in the next lemma (the proof is in the appendix).

Lemma 2. Consider a Controller-silent timed game (GkT , Bad×F), and a complete DFA H with accepting states FH, satisfying L(T ) ⊆ L(H).


Note that in the above lemma, it is crucial that the game is Controller-silent. In fact, if Controller could take edges that synchronize with T , then we may not be able to apply a strategy in GkH to GkT , since such a strategy may prescribe traces that are not accepted in T . Moreover, if Controller locations are not urgent, we would not know how to select the delays when mapping the strategy to GkT .

Fig. 4. The learning-based compositional controller synthesis algorithm for the input timed game GkT , with G a Controller-silent finite game, and T a label-deterministic timed automaton. Two automata learning algorithms run in parallel to learn underand over-approximations H and H such that H ⊆ L(T ) ⊆ H.

#### 3.3 Learning-Based Compositional Controller Synthesis Algorithm

We now present our compositional controller synthesis algorithm whose overview is given in Figure 4. The algorithm for controller synthesis is more involved than the model checking algorithm due to the alternating semantics for two players in games. It consists in two phases that alternate: the overapproximation phase, and the underapproximation phase. Each phase runs a DFA learning algorithm which is interrupted when we switch to the other phase, and continued when we switch back, until a decision is made. Together, both phases maintain two approximations, H and H, such that L(H) ⊆ L(T ) ⊆ L(H).

The objective of the overapproximation phase is to attempt to learn a DFA H satisfying L(T ) ⊆ H, and such that Controller wins in GkH. The learning algorithm uses membership and inclusion oracles just like in Section 2.2. Once such a candidate DFA H is found, the synthesis oracle checks, using finite-state techniques, whether Controller has a winning strategy in GkH. If this is the case, we stop and conclude that Controller wins in GkT by Lemma 2. Otherwise, Environment has a winning strategy S in this game; and we switch to the underapproximation phase.

The goal of the underapproximation is to check whether the given Environment strategy S can be proved to be spurious. Intuitively, we would like to check whether L((GkH) <sup>S</sup>) ⊆ L(T ) and reject if this is the case. In fact, by Lemma 2, we know that a winning Environment strategy in GkT implies that there is such a strategy S. This is the source of incompleteness of our algorithm, since this

condition is necessary but not sufficient for Environment to win; that is, the condition does not guarantee that Environment actually wins in GkT .

While L((GkH) <sup>S</sup>) ⊆ L(T ) can be checked with a timed automaton model checker (see Checking Containment below), this would mean exploring the large state space due to G. Since we want to avoid using timed automata model checkers on such large instances, we rather learn an underapproximation H of L(T ) using the membership and containment oracles, and use a finite-state model checker to check L((GkH) <sup>S</sup>) ⊆ L(H). Note that although the learning process does require inclusion checks of the form L(H) ⊆ L(T ), this check is feasible with a timed automaton model checker since H is typically much smaller than G. If the above check passes, then we reject the instance, that is, we declare the system not controllable. Otherwise, some trace w appears in L((GkH) <sup>S</sup>) but not in L(H). If w ∈ L(T ), then we require that w be included in H, and continue the learning process. Otherwise, S is not valid since it induces w which is not in L(T ). So we interrupt the current phase and switch back to the overapproximation phase requiring w to be removed from H.

Membership and inclusion oracles are implemented with a timed automata model checker. Here, the synthesis oracle can be any finite game solver; we just need the capability of computing the controlled system (GkH) <sup>S</sup>. Such a system is finite-state, so the strategy containment oracle can be implemented using a finitestate model checker (since H is deterministic and can thus be complemented). It remains to explain how the containment oracle is implemented.

Checking Containment L(H) ⊆ L(T ). First, notice that, even with determinism assumptions on T , the untimed language of the timed automaton complement of T is not the complement of L(T ). To see this, consider a timed automaton with a single state which is both initial and accepting, a single clock x, and a self-loop with guard x = 1, labeled by σ. Then, both L(T ) and L(T c ) are the language σ <sup>∗</sup> where T <sup>c</sup> denotes the timed automaton complement.

Nevertheless, assuming the label-determinism of T , this check can be done by a simple adaptation of a zone-based exploration algorithm, as follows. Let us assume that accepting states are reachable from all states of H, which can be ensured by a preprocessing step. We start exploring the timed automaton HkT using a zone-based exploration algorithm [11]. Consider any search node ((qH, q<sup>T</sup> ), Z) encountered during the exploration algorithm, reachable by the trace w, where (qH, q<sup>T</sup> ) is a location of HkT , and Z a zone. The exploration algorithm generates all available successors for σ ∈ Σ. We make the following additional check: If there is σ <sup>0</sup> ∈ Σ such that q<sup>H</sup> has a successor by σ 0 , but not T (either because there is no such edge, or because the guard of the unique edge labeled by σ 0 is not satisfied by Z), then we stop and return the trace wσ<sup>0</sup> ∈ L(H) \ L(T ) as a counterexample to containment. If no such label can be found, the zone-based exploration will terminate and the algorithm confirms the containment.

As an alternative, one can use testing such as the Wp-method [36] to establish the containment, as it is customary in DFA learning. In this case, the answer is approximate in the sense that the conformance test can fail to detect that containment does not hold. However, this does not affect the soundness of the overall algorithm since it can only increase false negatives.

#### 3.4 Experiments

Our tool accepts instances GkT where G is given as a Verilog module, and T as a TChecker timed automaton. Some of the inputs of the Verilog module are uncontrollable (chosen by Environment), some others are controllable (chosen by Controller). We use outputs of the Verilog module to define the synchronization labels Σ; while TChecker models tag each transition with such a label.

Table 2. The results of the controller synthesis experiments. The columns #Clks, #C, #M respectively show the number of clocks in the model, the numbers of conjectures and membership queries made by the compositional algorithm; while |H|, |H| show the sizes of the DFAs learned by the two phases.


Membership, inclusion, and containment queries are answered by TChecker. For the synthesis oracle, we used the game solver Abssynthe [12]. Abssynthe's input format is the and-inverter graphs format (AIG). For translating Verilog modules to AIG circuits, we use berkeley-abc and yosys. Abssynthe is able to compute the winning strategy S for the winning player; it also computes the system controlled by S in this case as an AIG circuit. The strategy containment oracle is implemented using NuSMV; since H is deterministic, one can complement it, and check whether the intersection with (GkH) <sup>S</sup> is empty.

The tool uses two Java threads to implement both learning phases, which are interrupted and continued while switching phases. Note that the very first learning step of H and H can be parallelized since the first underapproximation conjecture H does not depend on S.

We evaluate our algorithm with two classes of benchmarks (Table 2). The only tool to which we compare is Uppaal-TIGA [6] since Synthia [44] is not available anymore, and we are not aware of any other timed game solver.

In the scheduling benchmarks, there are two sporadic tasks that arrive nondeterministically, but constrained by the timed automaton. The controller must schedule these using two machines which have internal states, modeled either by a simple 8-bit counter, or by a genbuf circuit from the SYNTCOMP database. The scheduling duration depends on the internal state: some states require executing two external tasks, some others require executing three. The external task has a nondeterministic duration constrained by the timed automaton. The internal states change when a task is finalized. The controller loses if all machines are busy upon the arrival of a new task, or if it schedules a task on a busy machine. Uppaal TIGA was able to solve the counter models since they induce a smaller state space, but failed at the genbuf models. The compositional algorithm could efficiently handle these models. Uppaal was generally able to determine very quickly when the model is not controllable by finding a small counterstrategy, while the compositional algorithm had a overhead: it had to learn H and H before it can find and check the counterexample.

In the planning benchmarks, a robot and an obstacle is moving in a 6×6 grid (or 9×9 for the stateless case). Each agent can decide to move to an adjacent cell when they are scheduled, and the scheduling times are determined by a timed automaton. The goal of the robot is to avoid the obstacles. In the genbuf case, there are moreover internal states that can cause a glitch and prevent the agents from performing their moves, depending on their states. Uppaal TIGA was not able to manage the large state space unlike the compositional algorithm in this case, but both were able to solve the stateless case.

#### 4 Conclusion

Related Works Perhaps the most closely related approach to our compositional model checking algorithm is trace abstraction refinement [25]. This was originally applied to program verification, and consists in building a network of finite automata that recognizes the program's control flow paths that are infeasible. One refines this language by model checking the control flow graph intersected with the complement of the automaton. Thus, the semantics of the variables of the program are abstractly represented by the finite automaton. This idea was applied to timed automata as well [54,15]. However, the generalization of the counterexamples which ensures convergence turns out to be less effective in timed automata. We attempted at obtaining an implementation, but could only confirm the poor performance for model checking timed automata as in [15] (we do not include these results here). It might be that simpler graph structures such as control flow graphs of programs are necessary for this approach to scale; further investigation is also necessary to study better generalization methods.

The learning-based compositional reasoning approach of [43] is also related to counter-example guided abstraction refinement (CEGAR) [18]. In fact, the automata learning algorithm builds an overapproximation of one of the components, and refines it as needed, guided by counterexamples. The difference is that,

instead of using predicates, one uses automata to represent the overapproximation. A discussion can also be found in [43].

Learning algorithms for event-recording automata, a subset of timed automata were studied in [24]. The algorithm of [43] was extended for these automata in [35]. In the context of parameter synthesis with learning, parameterized systems were seen as a parallel composition of a non-parameterized component, and a parameterized component in [2].

Other approaches targeting the formal verification of real-time systems with large discrete state spaces include encodings of timed automata semantics in Boolean logic include [33,49]. An extension of and-inverter graphs were used in [20] that uses predicates to represent the state space of linear hybrid automata.

The abstract interpretation of games were studied in [27] that presents a theory allowing one to define under- and over-approximations. Abstraction-refinement algorithms based on counterexamples were given in [26,21]. These ideas were applied to timed games in [44]. Several abstraction-refinement and compositional algorithms were given in [12,13] for solving finite-state games given as Boolean circuits. The synthesis competition gathers every year researchers who present their game solvers [31,32].

Perspectives The algorithm we presented builds finite-state abstractions of realtime constraints, that it represents as DFA. The approach is well adapted when the interaction alphabet between A and T is small; this is the case, for instance, for distributed systems where the time constraints are used to describe the approximate period with which each process communicates with its neighbors; so the alphabet contains only a few symbols per process. Some of the benchmarks we considered are models of such systems. The approach is less convenient for timeintensive systems such as, say, job shop scheduling problems where a separate alphabet symbol is needed for each task.

As future work, we would like to understand when various abstraction schemes are efficient among the approach presented here, the predicate-abstraction approach, and zone-based state-space exploration. Currently, all algorithms fail in some benchmarks. Understanding the strengths of each algorithm might help designing a uniformly better solution. Currently, we can only verify linear properties; one might verify branching-time properties by learning automata with a stronger notion of equivalence such as bisimulation. In fact, an important limitation is due to learning being slow for large alphabets. Our setting could be extended to deal with large or symbolic alphabets e.g. [37,38].

For synthesis, our setting is currently restricted by the abstractions we use since when the algorithm rejects the instance, we cannot conclude whether the system is controllable or not. Using both the under- and overapproximations within the finite-state synthesis, for instance, using the three-valued abstraction approach [21] might allow us to render the approach complete, and to consider a larger class of timed games such as those that allow Controller to select nonzero delays.

Data Availability Statement Source codes, executables, and benchmark data are available as an artifact [47].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Graphs/Probabilistic Systems**

## A Truly Symbolic Linear-Time Algorithm for SCC Decomposition

Casper Abild Larsen, Simon Meldahl Schmidt, Jesper Steensgaard, Anna Blume Jakobsen, Jaco van de Pol , and Andreas Pavlogiannis()

> Aarhus University, Aarhus, Denmark {jaco,pavlogiannis}@cs.au.dk

Abstract. Decomposing a directed graph to its strongly connected components (SCCs) is a fundamental task in model checking. To deal with the state-space explosion problem, graphs are often represented symbolically using binary decision diagrams (BDDs), which have exponential compression capabilities. The theoretically-best symbolic algorithm for SCC decomposition is Gentilini et al's Skeleton algorithm, that uses O(n) symbolic steps on a graph of n nodes. However, Skeleton uses Θ(n) symbolic objects, as opposed to (poly-)logarithmically many, which is the norm for symbolic algorithms, thereby relinquishing its symbolic nature. Here we present Chain, a new symbolic algorithm for SCC decomposition that also makes O(n) symbolic steps, but further uses logarithmic space, and is thus truly symbolic. We then extend Chain to ColoredChain, an algorithm for SCC decomposition on edge-colored graphs, which arise naturally in model-checking a family of systems. Finally, we perform an experimental evaluation of Chain among other standard symbolic SCC algorithms in the literature. The results show that Chain is competitive on almost all benchmarks, and often faster, while it clearly outperforms all other algorithms on challenging inputs.

Keywords: Binary decision diagrams · Strongly connected components · Colored graphs

### 1 Introduction

Strongly connected components (SCCs) are one of the most elegant and widely applicable concepts of graph theory. They play a fundamental role in model checking for LTL and ω-regular properties, as most model-checking tasks reduce to locating cycles that traverse certain vertices in a graph [26], while strong fairness assumptions typically require an SCC decomposition at hand [21,31]. SCCs are also a key step to characterizing the attractor properties of systems, such as bottom SCCs in Markov Chains [2] and maximal end components in Markov Decision Processes [12]. From an algorithmic point of view, the simplest approach to SCC decomposition is by running a forward-backward reachability analysis

from each vertex, which results in O(n 2 ) time on a graph of n vertices. The celebrated Tarjan's algorithm [28], and subsequently Dijkstra's algorithm [15] and Kosaraju-Sharir's algorithm [27] have reduced the complexity down to O(n).

In the everyday practice of model checking, systems are represented as symbolic, rather than explicit graphs. One predominant symbolic representation is via (reduced/ordered) Binary Decision Diagrams (BDDs) [9], which are found at the core of many classic and modern model checkers [13,23,19,24,3]. BDDs can offer exponential compactness of the huge state space typically involved in the model-checking task, by succinctly encoding symmetries abundant in the represented system. On the other hand, this symbolic representation gives only coarse-grained efficient access to the graph. In particular, one can query for the image and preimage of a set of vertices with respect to the edge relation, which accounts for one symbolic step. Although the time for performing a symbolic step may vary, it is typically significantly larger than the time taken to perform elementary operations (e.g., incrementing a counter). As such, symbolic steps serve as the complexity measure of symbolic algorithms [8,18,11].

The simplest symbolic algorithm for SCC decomposition is the FwdBwd algorithm, which computes the SCC of a vertex u as the intersection of its forward and backward sets (as in the explicit setting). As this results in O(n 2 ) time complexity, the algorithm is often too slow in practice. The key challenge towards efficient symbolic SCC algorithms is the seeming difficulty to traverse the input graph G in a depth-first fashion, which is the technical underpinning of the O(n)-time explicit SCC algorithms. Nevertheless, a series of improvements have been made in this direction: (i) a variant of FwdBwd was shown in [30] to run in time O(δn), where δ is the diameter of G, and only becomes quadratic when δ = Θ(n), (ii) the LockStep algorithm [7] has complexity O(n log n), while (iii) the Skeleton algorithm with complexity O(n) is provably optimal [11]. Practical improvements based on heuristics have also been proposed [29,16,31].

One characteristic requirement for symbolic algorithms is that they operate in logarithmic symbolic space, i.e., they use logarithmically many objects, with the size of a single symbolic data structure (e.g., a BDD) counting as O(1) [11]. Indeed, without this restriction, an algorithm could extract, and later analyze, an explicit representation of its input graph, thereby relinquishing its symbolic nature. Unfortunately, the theoretically optimal Skeleton algorithm uses Θ(n) space, thereby violating the logarithmic-space requirement. As such, we find that Skeleton is not truly symbolic, which also has a measurable effect: perhaps paradoxically, Skeleton is often the slowest algorithm in practice.

#### 1.1 Our Contributions

The Chain algorithm. We present a new algorithm, Chain, for symbolically computing SCC decompositions. On input graph G with n vertices, Chain takes time O( P S∈SCCs(G) (δ(S) + 1)) = O(n), where SCCs(G) denotes the SCCs of G and δ(S) is the diameter of S. It is known that Ω( P S∈SCCs(G) (δ(S) + 1)) is also a lower bound for the problem [11], thus Chain is optimal. Moreover, Chain uses O(log n) symbolic data structures, thus being truly symbolic.

It is worth highlighting that Chain offers optimality while also being arguably the simplest among all symbolic SCC decomposition algorithms beyond FwdBwd. Indeed, Chain simply extends FwdBwd to accept as an argument a set of vertices K, among which to choose a pivot in the current recursive call. It is perhaps surprising that such a simple mechanism has been elusive for decades, as all previous efforts [30,7,17] relied on more elaborate procedures to either reduce or refine the O(n 2 ) time bound. That being said, our new mechanism is somewhat insightful and with a non-trivial complexity analysis.

The ColoredChain algorithm. We extend Chain to ColoredChain for computing SCCs on edge-colored graphs, in which edges have colors, and SCCs are formed by restricting to monochromatic paths. Although a graph of p colors can be handled in O(pn) time by breaking it to its monochromatic components and executing Chain on each of them, ColoredChain handles all colors simultaneously, thus benefiting from the symbolic compression of the edge relation across multiple colors. A similar approach was followed recently [6], by extending the standard LockStep algorithm [7] to colored graphs. However, the corresponding colored LockStep algorithm runs in time O(pn log n), as it inherits the log n factor from the basic LockStep algorithm.

Experimental evaluation. We implement and evaluate Chain in controlled, synthetic, and previously-used experimental settings. We find that Chain is never notably slower than other, standard algorithms, except when compared to LockStep on a few benchmarks. On the other hand, Chain is measurably faster than all other algorithms on demanding inputs. We further evaluate ColoredChain on colored Boolean Networks, used recently for the colored LockStep algorithm [6]. Our results indicate that ColoredChain is considerably faster than LockStep, making it a promising alternative for the analysis of Boolean networks.

#### 2 Preliminaries

Here we set up our main notation on graphs, SCCs, and symbolic algorithms.

General notation. Given a natural number ` ∈ N, we let [`] = {1, 2, . . . , `}.

Graphs. We consider (directed) graphs G = (V, E), where V is a set of n vertices and E ⊆ V × V is a set of edges. Given a set X ⊆ V , the restriction of G on X is the graph G[X] = (X, E ∩ (X × X)). For a vertex v, we let Pre(v) = {u: (u, v) ∈ E} and Post(v) = {u: (v, u) ∈ E} denote the set of preimage and image of v under E, respectively. We lift this notation to sets of vertices X, by letting Pre(X) = S <sup>v</sup>∈<sup>X</sup> Pre(v) and Post(X) = S <sup>v</sup>∈<sup>X</sup> Post(v). A path from v to u in G is a sequence of vertices P : v = w1, w2, . . . , w` = u such that, for each i ∈ [` − 1], we have (w<sup>i</sup> , wi+1) ∈ E. The length of P is |P| = ` − 1, while a single vertex v serves as a path of length 0. We denote by v u the existence of a path from v to u, and call u reachable from v if there is such a path in G. For a vertex v ∈ V , we let Fwd(v) and Bwd(v) denote the reflexive transitive closure of Post(v) and Pre(v), respectively. In other words, Fwd(v) (resp., Bwd(v)) contains the vertices that are reachable from v (resp., can reach v). Given an additional set X ⊆ V , we let Fwd(v, X) and Bwd(v, X) denote the forward and backward, respectively, set of v in the graph G[X]. The distance from v to u is the length of the shortest path v u, i.e., d(v, u) = min<sup>P</sup> : <sup>v</sup> <sup>u</sup> |P|, where we take the minimum of an empty set to be ∞. The diameter of a set X ⊆ V is δ(X) = maxv,u∈X,v <sup>u</sup> d(v, u), i.e., it is the maximum distance between any pair of vertices in X, provided that they are connected by a path.

Strongly connected components (SCCs). A set X ⊆ V is strongly connected if, for every two vertices v, u ∈ X, we have v u. A strongly connected component (SCC) of G is a maximal strongly connected set S ⊆ V . Given a vertex v ∈ V , we let SCC(v) denote its SCC. We let SCCs(G) denote the set of SCCs of G; note that SCCs(G) induces a partitioning on V . A set X ⊆ V is called SCC-closed if for every S ∈ SCCs(G), we have either S ⊆ X or S∩X = ∅. In other words, for every v ∈ X, we have SCC(v) ⊆ X. We sometimes call G[X] SCC-closed, to indicate that X is SCC-closed (in G).

Symbolic operations and complexity measures. We consider that graphs are represented symbolically using Binary Decision Diagrams (BDDs) [9]. The symbolic representation suggests that efficient access to the graph can only be carried out in a coarse-grained way. In particular, given a symbolicallyrepresented set of vertices X, a symbolic operation on X is either Pre(X) or Post(X), and serves as the unit of time in measuring the time complexity of symbolic algorithms. As per standard, we also perform common set operations such as union, intersection, and difference, and use a specialized function Pick(X) that returns an arbitrary vertex u ∈ X. This operation is natural in symbolic SCC algorithms, as typically one needs to identify a specific vertex u in order to output SCC(u). In alignment with the symbolic time complexity, the symbolic space complexity of an algorithm is measured in number of (symbolic, or not) objects it uses. As symbolic representations usually allow (in the context they are designed for) large (and sometimes, even exponential) compression, we require symbolic algorithms to operate in logarithmic symbolic space [11].

### 3 The Chain Algorithm

In this section we present the main result of this paper: a new algorithm, called Chain, that runs in linear time and is truly symbolic (i.e., it uses O(log n) symbolic memory). In particular, we establish the following theorem.

Theorem 1. Given a graph G = (V, E) of n nodes Chain computes SCCs(G) in O( P S∈SCCs(G) (δ(S) + 1)) symbolic time and O(log n) symbolic space.

Note that O( P S∈SCCs(G) (δ(S) + 1)) = O(n), as SCCs(G) partition G, while for each S ∈ SCCs(G) we have δ(S) ≤ |S|. It is worth observing that O( P S∈SCCs(G) (δ(S) + 1)) can, however, be much smaller than n: e.g., over cliques G, this bound becomes O(1). On the other hand, it is known that Ω( P S∈SCCs(G) (δ(S) + 1)) is also a lower bound for the problem [11], hence Theorem 1 is tight. As was shown in [11], a more refined analysis of the Skeleton algorithm also achieves the time bound of Theorem 1. However, Skeleton suffers a linear space bound, and thus is not truly symbolic.

In the following, we first present Chain in detail in Section 3.1. It's correctness is relatively straightforward, and stated in Section 3.2. On the other hand, its complexity analysis is more involved, and is presented in Section 3.3.

#### 3.1 Algorithm

Here we present Chain in detail, develop some intuition behind its time complexity, and illustrate its execution on a small example.

```
Algorithm 1: Chain
  Input: A graph G = (V, E), a vertex set K ⊆ V
1 if V = ∅ then return
2 if K 6= ∅ then // Pick a pivot on the chain, if possible
3 v = Pick(K)
4 else
5 v = Pick(V )
6 F = ∅; Last = ∅; Layer = {v}; S = {v}
7 while Layer 6= ∅ do // Compute Fwd(v, V )
8 F = F ∪ Layer
9 Last = Layer
10 Layer = Post(Layer) \ F
11 while Pre(S) ∩ F 6⊆ S do // Compute SCC(v)
12 S = S ∪ (Pre(S) ∩ F)
13 output S
14 Chain (G[F \ S], Last \S) // Recursive call on the forward set
15 Chain (G[V \ F],Pre(S) \ F) // Recursive call on the rest
```
The Chain algorithm. Algorithm 1 presents Chain in pseudocode. The principle of operation of the algorithm is, perhaps, surprisingly simple. Given a G = (V, E) and a pivot vertex v of G, the algorithm computes SCC(v) in two phases, similarly to the standard FwdBwd algorithm. In particular:


However, in order to avoid the high complexity, Chain passes along each recursive call the K argument (initially K = ∅). This argument restricts the recursive call to pick its next pivot v such that v ∈ K; choosing the right set to pass as K makes the algorithm achieve its tight time complexity.

Conceptually, after Fwd(v, V ) has been computed, the first recursive call (Line 14) chooses K to be the set of vertices that are of maximum distance from v (and not in SCC(v), as those are output in Line 13). On the other hand, the second recursive call (Line 15) chooses K to be the predecessors of SCC(v). Although the formal complexity analysis is somewhat involved (see Section 3.3), the key, high-level idea is as follows. When computing Fwd(v, V ), the algorithm has taken a number of symbolic steps that is proportional to the maximum distance of a vertex from v. The chain of recursive calls starting in Line 14 and followed by all recursive calls in Line 15 until Pre(S) ∩ F = ∅, ensures that the algorithm will output all SCCs, in reverse order, along a maximal path from v to a vertex in Fwd(v, V ) \ SCC(v). This amortizes the high cost of computing Fwd(v, V ) in the current call to the cost of outputting these SCCs in future calls, leading to only a constant factor increase in the overall complexity.

Besides viewing Chain as an augmentation of the FwdBwd algorithm with a restriction on pivots, the algorithm can also be seen as a simplification of the Skeleton algorithm [17]. Indeed, the computation of skeletons in the latter serves the exact purpose to force the recursion to output SCCs in the same order as in our chain argument above. As we show here, computing skeletons is redundant: dropping them makes the algorithm simpler, truly symbolic, while not sacrificing any of its time-complexity guarantees.

Example. Fig. 1 illustrates Chain on a graph G = (V, E) (left). The tree T (right) represents the recursion of Chain as it outputs SCCs(G). We identify every vertex of T by a vertex v ∈ V for which SCC(v) is computed in the corresponding step. We subscript variables of the algorithm with v to denote their value at that step. E.g., V<sup>v</sup> denotes the vertex set in the recursive call that computed SCC(v), and F<sup>v</sup> denotes the forward set computed after the loop of Line 7 has completed. The edges of T are labeled with the line that performed the respective recursive call.

The key observation for understanding the complexity of Chain is as follows. In the first step, the algorithm has paid the high cost of 5 symbolic steps to compute F1, while its output is a small SCC of 2 vertices. However, the path 1 14 −→ 6 15 −→ 4 15 −→ 3 in T forms a chain from vertex 6, which is of maximum distance from 1, back to vertex 3 that is adjacent to SCC(1). The cost of computing F<sup>1</sup> can thus be amortized to outputting the SCCs along this chain (i.e., SCC(3), SCC(4), SCC(6)), yielding only a linear overhead. As we prove in Section 3.3, this behavior is not accidental, but guaranteed in every recursive call.

#### 3.2 Correctness

We start with the soundness of Chain, i.e., it only outputs SCCs of G.

Fig. 1. An input graph (left), and the recursive computation of Chain (right). Lemma 1. In every call of Chain, Line 13 outputs an SCC of G.

Proof. Consider any call to Chain on input G<sup>0</sup> = (V 0 , E<sup>0</sup> ), K<sup>0</sup> , with K<sup>0</sup> ⊆ V 0 . The algorithm first picks a vertex v from either V <sup>0</sup> or K<sup>0</sup> , with v ∈ S, where S is the set outputted in Line 13. It is straightforward to see that, after the loop in Line 7 has executed, we have F = Fwd(u, V <sup>0</sup> ), while after the loop in Line 11 has executed, we have S = Fwd(u, V <sup>0</sup> ) ∩ Bwd(u, V <sup>0</sup> ). It suffices to argue that G<sup>0</sup> is an SCC-closed subgraph of G, which implies that S = SCC(v).

The statement is true initially, as G<sup>0</sup> = G. Now, assuming that the statement holds on some input G<sup>0</sup> = (V 0 , E<sup>0</sup> ), K<sup>0</sup> we argue each of G<sup>0</sup> [F \S] and G<sup>0</sup> [V <sup>0</sup> \F], in Line 14 and Line 15, respectively, is SCC-closed. Indeed, F is closed under Post operations and thus SCC-closed. As S is an SCC of X, we have that F \S is also SCC closed. Since F \S, S, and V <sup>0</sup> \F partition V 0 , we have that G<sup>0</sup> [V <sup>0</sup> \F] is also SCC-closed. The desired result follows. ut

Lemma 2. Chain outputs every SCC in SCCs(G) exactly once.

Proof. The statement follows from the fact that, in every recursive call on input G<sup>0</sup> = (V 0 , E<sup>0</sup> ), the sets F \ S, S, and V <sup>0</sup> \ F partition V 0 . ut

#### 3.3 Complexity Analysis

We now present the (symbolic) time and space complexity analysis of Chain. For measuring time, we only count the number of Pre(·) and Post(·) operations.

Consider any input G = (V, E), and let T be the recursion tree produced by the execution of Chain on G, as in Fig. 1. We will use lowercase (resp., uppercase) letters to refer to the vertices of G (resp., T), and we will subscript the variables of the algorithm with vertices of T (e.g., VA) to refer to variables in the recursive call associated with the recursive step (at A). T has labeled directed edges A f −→ B, where f ∈ {14, 15} denotes the line of the recursive call that made B a child of A in T. Without loss of generality, we consider that every vertex A of T corresponds to a recursive call with V<sup>A</sup> 6= ∅.

Main complexity analysis. Consider an edge A 14 −→ B in T, and the path A 14 −→ B<sup>1</sup> 15 −→ B<sup>2</sup> 15 −→ . . . 15 −→ Bk, where B<sup>k</sup> is the first vertex B for which Pre(SB) \ F<sup>B</sup> = ∅ in Line 15. Let Levels(A) denote the number of iterations executing in Line 7, and note that Levels(A) = maxu∈V<sup>A</sup> d(vA, u). The crux of the complexity proof of Chain is the following lemma.

Lemma 3. Levels(A) ≤ δ(SCC(vA)) + 1 + P i∈[k] (δ(SCC(vB<sup>i</sup> )) + 1).

Before we prove Lemma 3, we show how it leads to the complexity of Theorem 1. Given a vertex A of T, let T (A) denote the running time of Chain on the subtree of T rooted at A. Let A 14 −→ B and A 15 −→ C be the children of A, and the path A 14 −→ B<sup>1</sup> 15 −→ B<sup>2</sup> 15 −→ . . . 15 −→ B<sup>k</sup> as defined above (thus B<sup>1</sup> = B). Then T (A) satisfies the following recurrence.

$$\begin{split} \mathcal{T}(A) &\stackrel{\text{loop in Line 7}}{\leq} \text{Levs}(v\_A) + \delta \text{(SCC(v\_A))} + 1 + \mathcal{T}(B) + \frac{\text{Line 14}}{1 + \mathcal{T}(C)} \\ &\leq \sum\_{i \in [k]} (\delta(\text{SCC}(v\_{B\_i})) + 1) + \delta(\text{SCC}(v\_A)) + 1 \\ &\quad + \delta(\text{SCC}(v\_A)) + 1 + \mathcal{T}(B) + 1 + \mathcal{T}(C) \\ &= \sum\_{i \in [k]} (\delta(\text{SCC}(v\_{B\_i})) + 1) + 2\delta(\text{SCC}(v\_A)) + 3 + \mathcal{T}(B) + \mathcal{T}(C) \end{split} \tag{Lema\_3}$$

For every i iterating in P i∈[k] (δ(SCC(v<sup>B</sup><sup>i</sup> )) + 1), the vertex v<sup>B</sup><sup>i</sup> will not appear in such a sum in any other vertex A<sup>0</sup> of T. Indeed assume towards contradiction that for some vertex B<sup>i</sup> there are two vertices A 6= A<sup>0</sup> and paths

$$P \colon A \xrightarrow{14} B\_1 \xrightarrow{15} B\_2 \xrightarrow{15} \dots \xrightarrow{15} B\_i \quad \text{and} \quad P' \colon A' \xrightarrow{14} B\_1' \xrightarrow{15} B\_2' \xrightarrow{15} \dots \xrightarrow{15} B\_i'$$

with B<sup>0</sup> <sup>i</sup> = B<sup>i</sup> . Due to the edge labels, none can be a sub-path of the other, which, in turn, contradicts the tree structure of T. Given such a vertex B<sup>i</sup> , let A(Bi) denote its unique ancestor in T that appears as vertex A in the path P above. The total running time of Chain on G is ≤ P B∈T (3δ(SCC(vB))+4), obtained by counting for each vertex B of T (i) the 2δ(SCC(vB))+3 symbolic operations from its own recursive call, plus (ii) δ(SCC(vB))+ 1 symbolic operations from the call at A(B). Hence the total number of symbolic steps is O( P S∈SCCs(G) (δ(S) + 1)).

Proof of Lemma 3. We now turn our attention to the proof of Lemma 3. Consider again the path A 14 −→ B<sup>1</sup> 15 −→ B<sup>2</sup> 15 −→ . . . 15 −→ B<sup>k</sup> of T as defined above. For simplicity of notation, let v<sup>i</sup> = vB<sup>i</sup> , for i ∈ [k]. Clearly SCC(vi) 6= SCC(v<sup>j</sup> ) for i 6= j. We start with two simple lemmas.

Lemma 4. For every i ∈ [k], we have KB<sup>i</sup> 6= ∅.

Proof. The statement holds for i = 1, since otherwise Last<sup>A</sup> \S<sup>A</sup> = ∅, implying that F<sup>A</sup> \S<sup>A</sup> = VB<sup>1</sup> = ∅, and thus B<sup>1</sup> would not be a vertex of T. The statement also holds for all i > 1, by construction of the path to Bk. ut

Lemma 5. For all i ∈ [k − 1], we have v<sup>i</sup> ∈ Fwd(vk).

Proof. The lemma follows from the more general statement that v<sup>i</sup> ∈ Fwd(vi+1). Indeed, by Lemma 4, we have that vi+1 ∈ Pre(SB<sup>i</sup> ), while SB<sup>i</sup> = SCC(vi). ut

We call a vertex u critical if it is the first vertex w in a path from v<sup>A</sup> to v<sup>k</sup> in VA, such that w 6∈ SCC(vA). We further call a path u v<sup>k</sup> critical if u is a critical vertex. In the example of Fig. 1, for the first call to Chain, where v<sup>A</sup> = 1, vertex 3 is a critical vertex and the path 3 → 4 → 5 → 6 is a critical path. The following lemma captures the fact that every recursive call B<sup>i</sup> is performed on a vertex set VB<sup>i</sup> that is adjacent to SCC(vA).

Lemma 6. For all i ∈ [k], the set VB<sup>i</sup> has a critical path.

Proof. The proof follows induction on i. For i = 1, we have VB<sup>1</sup> = Fwd(vA, VA)\ SCC(vA). Since A 14 −→ B<sup>1</sup> in T, we have Fwd(vA, VA)\SCC(vA) = V<sup>B</sup><sup>1</sup> 6= ∅, thus the statement holds for i = 1. Now assume that the statement holds for some i ≥ 1, and we argue that it holds for i + 1. Take any critical path P : u v<sup>k</sup> in V<sup>B</sup><sup>i</sup> , and assume towards contradiction that P is not a path in V<sup>B</sup>i+1 (i.e., at least one vertex of P is outside V<sup>B</sup>i+1 ). Since V<sup>B</sup>i+1 = V<sup>B</sup><sup>i</sup> \ Fwd(v<sup>i</sup> , V<sup>B</sup><sup>i</sup> ), we obtain that P has a vertex w with w ∈ Fwd(v<sup>i</sup> , V<sup>B</sup><sup>i</sup> ), and hence v<sup>k</sup> ∈ Fwd(vi). By Lemma 5, we also have v<sup>i</sup> ∈ Fwd(vk), thus SCC(vi) = SCC(vk), violating the choices of v<sup>i</sup> . Thus V<sup>B</sup>i+1 has a critical path. ut

Specifically for the case i = k, the following is a strengthening of Lemma 6, showing that SCC(vk) (only a subset of V<sup>B</sup><sup>k</sup> ) is also adjacent to SCC(vA).

Lemma 7. SCC(vk) contains a critical vertex.

Proof. By Lemma 6, we have a critical path u v<sup>k</sup> in V<sup>B</sup><sup>k</sup> . By construction, (Pre(SCC(vk)) ∩ V<sup>B</sup><sup>k</sup> ) \ SCC(vk) = ∅, thus u ∈ SCC(vk). ut

Let vk+1 be a critical vertex in SCC(vk), whose existence is guaranteed by Lemma 7. Given a vertex u ∈ VA, we write `(u) for the distance of u from v<sup>A</sup> in VA. Note that Levels(A) = `(v1). Observe that for all u, v ∈ VA, if u ∈ SCC(v) then `(u) − `(v) ≤ δ(SCC(v)). The following two lemmas relate the distances `(vi) with the diameters of SCCs, and lead to the proof of Lemma 3.

Lemma 8. We have `(vk+1) ≤ δ(SCC(vA)) + 1.

Proof. By definition, there is a vertex w ∈ Pre(vk+1)∩SCC(vA). We have `(w) ≥ `(vk+1) − 1, while `(w) ≤ δ(SCC(vA)), hence `(vk+1) ≤ δ(SCC(vA)) + 1. ut

Lemma 9. For every i ∈ [k], we have `(vi) − `(vi+1) ≤ δ(SCC(vi)) + 1.

Proof. The statement holds trivially when `(vi) ≤ `(vi+1). Now consider the case that `(vi) > `(vi+1). If i = k, then by our choice of vk+1, we have vi+1 ∈ SCC(vi), thus `(vi) − `(vi+1) ≤ δ(SCC(vi)). Now consider that i < k. By construction, there is a vertex w ∈ SCC(vi) ∩ Post(vi+1). Then `(vi) − `(w) ≤ δ(SCC(vi)), while `(w) ≤ `(vi+1) + 1, resulting in `(vi) − `(vi+1) ≤ δ(SCC(vi)) + 1. ut

Proof (of Lemma 3).

$$\begin{split} \text{Levels}(A) &= \ell(v\_1) = \sum\_{i \in [k]} (\ell(v\_i) - \ell(v\_{i+1})) + \ell(v\_{k+1}) \\ &\leq \sum\_{i \in [k]} (\ell(v\_i) - \ell(v\_{i+1})) + \delta(\text{SCC}(v\_A)) + 1 \qquad \text{[Lemma 8]} \end{split}$$

$$1 \le \sum\_{i \in [k]} (\delta(\text{SCC}(v\_i)) + 1) + \delta(\text{SCC}(v\_A)) + 1 \quad \text{[Lemma 9]}$$


Space complexity. Finally, we address the O(log n) symbolic-space complexity of Theorem 1. Chain uses O(1) symbolic sets in each recursive call. To achieve the O(log n) bound, it suffices to first follow the recursive call between Line 14 and Line 15 with the smaller graph input. This results in O(log n) pending recursive calls at any step of the execution, leading to storing O(log n) symbolic sets overall. Note that this requires a function Count(X) that returns the size of a symbolically represented set X. This is not a problem: BDDs are equipped with such operations, and their complexity is only linear in the size of the representation of X, even though X might be exponentially large.

#### 4 Extension to Colored Graphs

In this section we turn our attention to colored graphs, where the edge relation is parameterized by colors, and SCCs are formed with respect to monochromatic components of the graph. Each edge color stands for a different binary relation, and all colors together allow to superpose several graphs on top of each other. Although each monochromatic graph could be represented in isolation, this superpositioning allows for an efficient symbolic representation, especially when the edge relations are highly similar. In turn, this asks for efficient symbolic algorithms that are able to exploit similarities between colors. Our study of this setting is inspired by the recent extension of LockStep to colored graphs [6].

#### 4.1 Edge-Colored Graphs

Here we lift some of our graph notation from Section 2 to the colored setting.

Colored graphs. An edge-colored graph G = (V, C, E) consists of a set of n vertices V , a set of p colors C, and an edge relation E ⊆ V ×C×V . Given a color c ∈ C, we let G<sup>c</sup> = (V, Ec) be the projection of G on c, where E<sup>c</sup> = E∩(V ×{c}× V ) restricts the edge relation to color c. Given two vertices v, u ∈ V , we write v <sup>c</sup> u to denote that there is a path v u in Gc, and say that u is c-reachable from v in G. A colored vertex set is a set X ⊆ V ×C. The restriction of G on X is the colored graph G[X] = (V 0 , C<sup>0</sup> , E<sup>0</sup> ), where (i) V <sup>0</sup> = {v : ∃c ∈ C.(v, c) ∈ X)}, (ii) C <sup>0</sup> = {c : ∃v ∈ V .(v, c) ∈ X}, and (iii) E<sup>0</sup> = {(u, c, v): (u, c),(v, c) ∈ X}. Given such a set X, we let Pre(X) = {(u, c): ∃(v, c) ∈ X.(u, c, v) ∈ E}, and Post(X) = {(u, c): ∃(v, c) ∈ X.(v, c, u) ∈ E}. We call a set V ⊆ V ×C degenerate if for all c ∈ Colors, we have |V ∩(V ×{c})| ≤ 1, i.e., V has at most one vertex per color. Given a degenerate set V, we let Fwd(V) = {(v, c): ∃(u, c) ∈ V and u c v}, i.e., it is the set of colored vertices reached by each colored vertex in V. We similarly let Bwd(V) = {(v, c): ∃(u, c) ∈ V and v <sup>c</sup> u}. Note that for degenerate sets, Fwd (Bwd) is the transitive closure of Post (Pre). Further, given a colored vertex set X, we let Fwd(V, X) (resp., Bwd(V, X)) be the set of colored vertices reached by (resp., reaching) each colored vertex in V in the subgraph G[X].

Colored SCCs. Given a colored graph G = (V, C, E), a c-colored SCC of G is a pair S = (R, c) ⊆ V × {c} such that R is an SCC of Gc. Given a vertex v ∈ V and a color c ∈ C, we write SCC(v, c) for the SCC of v in Gc. We let SCCs(G) denote the set of SCCs of G, and observe that SCCs(G) partitions V × C. A set X ⊆ V × C is SCC-closed if for every color c ∈ C, the set X ∩ (V × {c}) is SCC closed in Gc. Given an SCC-closed set X, we will also call G[X] SCC-closed. Given a degenerate set V, we write SCC(V) for the set of SCCs {(R, c): (v, c) ∈ V and R = SCC(v) in Gc}.

Symbolic operations. Similarly to the non-colored setting, we use symbolic operations Pre(X) and Post(X) on sets X ⊆ V ×C, which incur a unit time cost. We further perform unions, intersections and differences on subsets of V ×C, and use a specialized operation Pick(X) that returns an arbitrary pair (v, c) ∈ X. Finally, we consider at our disposal a function Pivots(X), that acts on sets X ⊆ V × C and returns a maximal degenerate subset of X containing one pair (v, c) per color c appearing in X. This operation can be performed by combining Pick with basic set operations, and has also appeared in other works [6].

### 4.2 The ColoredChain Algorithm

Here we present our extension of Chain for handling edge colored graphs.

Algorithm 2: ColoredChain Input: A graph G = (V, C, E), two colored vertex sets X, K ⊆ V × C, if X = ∅ then return V = Pivots (K ∪ (X \ (V × Colors(K))) // A degenerate set of pivots F = ∅; Last = ∅; Layer = V; S = V while Layer 6= ∅ do // Compute Fwd(V, X) F = F ∪ Layer Last = Layer ∪ (Last \(V × Colors(Layer))) Layer = Post(Layer) \ F while Pre(S) ∩ F 6⊆ S do // Compute SCC(V) S = S ∪ (Pre(S) ∩ F) <sup>10</sup> output S ColoredChain (G[F \ S], F \ S, Last \S) ColoredChain (G[X \ F], X \ F,Pre(S) \ F)

The ColoredChain algorithm. Algorithm 2 presents ColoredChain in pseudocode. The algorithm takes as input an edge-colored graph G = (V, C, E), as well as two colored vertex sets X and K (initially X = V ×Colors and K = ∅). In words, the current and future recursive steps will compute the colored SCCs of G that are subsets of X. The set K serves the same purpose as in the basic Chain algorithm, i.e., to restrict the set of vertices over which we select pivots in the current recursive call, towards the linear-time properties of the algorithm. The algorithm starts by selecting a degenerate set of pivots V in Line 2, with the goal to output each SCC(v, c), for (v, c) ∈ V in the current recursive step. The pivot set is constructed to contain one pair (v, c) for every color c present in X. If c is also present in K, then the algorithm selects a pivot (v, c) ∈ K, otherwise, it chooses an arbitrary pivot from X. The algorithm then computes SCC(V) as Fwd(V, X) ∩ Bwd(V, X), similarly to the non-colored case (where V is simply a non-colored vertex). In the i-th iteration of the loop of Line 4, the variable Last contains the vertices (u, c) that have maximum distance ≤ i from (v, c) ∈ V. As these maximal distances might converge at different lengths for different colors, extra care is taken in Line 6 to maintain the converged colors in the next iteration. Finally, the algorithm outputs SCC(V) (Line 10), and proceeds recursively on the disjoint subsets F \ S and X \ F (Line 11 and Line 12). The K argument is passed on each recursive call in the same way as in the Chain algorithm, so that, in effect, the time taken to compute F is amortized by the time to output colored SCCs in subsequent recursive calls (where now the amortization also takes place among colors). Observe that, in the special case of p = 1 color, ColoredChain operates identically to Chain.

Correctness and complexity. Due to the similarity of ColoredChain to Chain, we will only sketch the main arguments for its correctness and complexity. The key observation for correctness is that each recursive call processes an SCC-closed subgraph of G. Indeed, given an SCC-closed colored vertex set X, for any vertex (v, c) ∈ X, we have SCC(v, c) = Fwd({(v, c)}, X)∩ Bwd({(v, c)}, X). Hence S = SCC(V) in Line 10. As F ∪ S is closed under Post operations and S is an SCC of X, we have F \ S (and thus also X \ F) is SCC closed.

The complexity of ColoredChain is O( P S∈SCCs(G) (δ(S) + 1)) = O(pn), as every vertex v belongs to exactly one SCC(v, c) for each color c ∈ C. This bound follows from amortizing the number of iterations of the loop in Line 4 to the diameter of a color that converges last in the loop. Observe that the computation on the remaining colors comes "for free". This is the benefit of treating all colors symbolically (as opposed to each monochromatic graph G<sup>c</sup> separately). The same observation holds for the while loop in Line 8.

### 5 Experiments

In this section we report our experimental evaluation of the new algorithms Chain and ColoredChain on three classes of benchmarks. We compared their performance to the standard algorithms FwdBwd [30], LockStep [7] (and its recent colored variant [6]) and Skeleton [17]. Our experiments were run on a Linux machine with 2.4GHz CPU speed and 60GB of memory (using 1 core).

#### 5.1 Experiments on Synthetic Benchmarks

To better illustrate the behavior of the various algorithms, we start with a controlled setting of synthetic benchmarks.

Setup. We performed a controlled experiment on product graphs G<sup>i</sup> <sup>k</sup> = Lk−i×C<sup>i</sup> , where L<sup>j</sup> (resp, C<sup>j</sup> ) denotes a line graph (resp., cycle graph) of size 2 j . This setup follows [4]. Observe that G<sup>i</sup> k has 2 <sup>k</sup>−<sup>i</sup> SCCs, of size (and diameter) 2 i each. Our implementation is in C++ and based on the Sylvan BDD library [14]. Recall that the behavior of each algorithm depends on the non-determinism involved in the Pick operation, that returns an arbitrary vertex of a given vertex set. Sylvan returns the vertex with the smallest (binary encoded) ID. We generated two variants of this setting: one in which vertex IDs follow an incremental order in each graph component, and one in which they are uniformly random.

Results. Fig. 2 shows the number of symbolic steps per algorithm, for graphs Gi <sup>10</sup>, i ∈ {0} ∪ [10]. When the vertex encoding follows sequential IDs (left), FwdBwd exhibits its worst-case Θ(n 2 ) performance on graphs with many SCCs (i.e., small i) as it repeatedly Pick's pivots with large forward sets. As i increases, the number of SCCs decreases, and FwdBwd eventually terminates in the first call (for i = 10). On the other hand, the other algorithms exhibit almost identical, O(n) performance. In particular, every recursive call of LockStep Pick's a vertex v whose backward set equals SCC(v); thus the algorithm convergences in a number of steps that is proportional to δ(SCC(v)), leading to Θ(n) performance. Finally, after the first call, Skeleton and Chain output SCCs in the

Fig. 2. Experimental results on product graphs G i <sup>10</sup> = L10−<sup>i</sup> × Ci.

reverse order of FwdBwd, performing in each step a number of symbolic steps that is proportional to the diameter of the SCC, like LockStep.

When the vertex encoding follows random IDs (right), every recursive call of FwdBwd and LockStep Pick's a pivot whose first component is roughly in the middle of the line segment that is processed in that call. Hence the two algorithms have similar performance, which follows Θ(n log n) behavior for large lines (i.e., when i is small). On the other hand, Skeleton and Chain spend O( P S∈SCCs(G) (δ(S) + 1)) symbolic steps. Naturally, for larger lines, the two algorithms spend more steps for computing the forward sets of their pivots, a cost that is amortized in later recursive calls by a constant factor. Observe, however, that Skeleton pays a larger constant factor, as the construction of skeletons requires the forward sets to also be traversed backwards. This results in Skeleton having the worst performance relative to the other algorithms when the number of SCCs decreases (i.e., as i gets larger), as there are fewer recursive calls to amortize the high cost of skeleton computation. Finally, we remark that for small and large i, Skeleton constructs (in expectation) Θ(n) BDDs, hence this is a family of graphs exposing the non-symbolic nature of the algorithm.

#### 5.2 Experiments on Uncolored Graphs

To better understand the performance of the various algorithms in the wild, we continue with their evaluation on standard model-checking benchmarks.

Setup. We considered benchmarks from the following categories:


In order to create equal experimental circumstances for all models, we used the language-independent model checker LTSmin [19] to generate the disjunctively partitioned symbolic transition relations for all these models. As symbolic representation, we chose the multi-core BDD package Sylvan [14]. We implemented all four algorithms of the previous section inside LTSmin. We disregarded graphs

of size < 10<sup>4</sup> , as such graphs are handled more efficiently by explicit algorithms. This led to a pool of 101 benchmarks. We measured the average time (across three runs) each algorithm took on each benchmark, while discarding the overhead due to state-space generation.

Results. Fig. 3 shows the running times of Chain against Skeleton, LockStep and FwdBwd. Compared to the only other theoretically optimal algorithm Skeleton, Chain is almost always somewhat faster, with the exception of one benchmark on which Chain is an order of magnitude faster. When compared to LockStep, we find the two algorithms to be incomparable, with Chain being slower on some benchmarks but faster on others. Indeed, we expect that LockStep behaves adequately in most practical scenarios, while its log n slowdown (as demonstrated in Section 5.1) is witnessed only rarely. Finally, we find that Chain is measurably and consistently equally-or-better performing than FwdBwd.

#### 5.3 Experiments on Colored Graphs

Finally, we turn our attention to colored graphs. We used models of discrete control systems representing Biological Genetic Networks [20]. In high level, a Boolean Network (BN) is defined by a set of Boolean variables X = {x1, . . . , xk} and update functions of the form x<sup>i</sup> := ϕ<sup>i</sup> , where each ϕ<sup>i</sup> is a Boolean combination over variables X. State updates are performed by nondeterministic applications of the functions ϕ<sup>i</sup> . In Colored Boolean Networks (CBNs), uninterpreted function symbols are used to represent uncertainty. For instance, x<sup>1</sup> := x<sup>2</sup> ∧ f(x3, x4) represents that x<sup>1</sup> has a positive dependence on x<sup>2</sup> and an unknown dependence on x<sup>3</sup> and x4. A single color corresponds to an assignment of Boolean functions to the uninterpreted function symbols. The set of colors is further restricted by constraints representing biological knowledge. This setting is inspired by its use to evaluate the recently introduced colored LockStep [6].

Setup. We implemented our new ColoredChain-algorithm in Scala, using JavaBDD (wrapping the classical BDD package BuDDy) with recommended

Fig. 4. Experimental results on colored graphs from AEON models (seconds). settings. We also reimplemented colored LockStep from [6] (without preprocessing) and FwdBwd in Scala/JavaBDD. We used the CBNs coming from the GINsim Boolean network database [10], represented in the AEON format that supported the experiments in [6], accessed at [1]. We focused on benchmarks with np ≥ 10<sup>4</sup> , as the rest were run in < 0.2s by all algorithms. We remark that most of these CBNs generate huge graphs; for the purposes of our evaluation, we timed our experiments within 1h, which yielded a pool of 9 benchmarks.

Results. Fig. 4 shows the running time of each of the three algorithms. Perhaps surprisingly, LockStep is consistently the slowest and by a large margin. On the other hand, ColoredChain was always considerably faster than LockStep, and consistently the fastest algorithm overall. The two exceptions are on the CBNs 5\_param\_g2a and 27\_068, where FwdBwd finished first in 2s and 1032s (as opposed to 4s and 1114s for ColoredChain). On the other hand, FwdBwd was considerably slower than ColoredChain in some CBNs (e.g., 20\_049). Although a wider experimental setting is required for conclusive results, our evaluation indicates that ColoredChain is very effective in handling CBNs.

### 6 Conclusion

We have introduced Chain, a new, truly symbolic, and time-optimal algorithm for SCC decomposition. The simplicity of Chain makes it theoretically elegant, while our experimental evaluation demonstrates a potential for practical impact. Some opportunities for future research include introducing saturation techniques [31] to Chain, as well as specializing it to the computation of bottom SCCs, which have received special attention [5].

Acknowledgements. This work was supported in part by Villum Fonden (Project VIL42117).

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Transforming Quantified Boolean Formulas Using Biclique Covers

Oliver Kullmann1()? and Ankit Shukla<sup>2</sup>

<sup>1</sup> Swansea University, Swansea, UK O.Kullmann@Swansea.ac.uk 2 Johannes Kepler University, Austria ankit.shukla@jku.at

Abstract. We introduce the global conflict graph of DQCNFs (dependency quantified conjunctive normal forms), recording clashes between clauses on such universal variables on which all existential variables depend (called "global variables"). The biclique covers of this graph correspond to the eligible clause-slices of the DQCNF which consider only the global variables. We show that all such slices yield satisfiabilityequivalent variations. This opens the possibility to realise this slice using as few global variables as possible. We give basic theoretical results and first supporting experimental data.

Keywords: QBF solving, DQBF, 2QCNF, biclique cover problem, conflict graph, preprocessing, Horn clause-sets, minimal unsatisfiability

### 1 Introduction

The last two decades have seen enormous progress in quantified Boolean formula (QBF) theory and technology, as witnessed by the Handbook chapters [2,14]. Core areas are preprocessing techniques, result validation of the solvers, strategy extraction, and theoretical lower bounds. There are many applications in the areas of artificial intelligence, planning, two player gaming and synthesis; see the overview [25]. This progress is complemented by the annual QBF competition called QBFEval (see [21]). A special class of QBF, 2QBF, is used to model problems with simple quantifier structure (see [1,24] for basic references). In the other direction, the more expressive logic DQBF has also seen recent progress in this decade; see for example [13,26,3,12]. Here solving techniques from SAT and QBFs are generalised, including preprocessing, strategy extraction and circuit synthesis. We remind at the central complexity classes covered here: SAT is NP-complete, 2QBF is Π<sup>P</sup> 2 -complete, QBF is PSPACE-complete, and DQBF is NEXPTIME-complete. In our paper we rely on the CNF-structure, and thus we will use 2QCNF instead of 2QBF, and DQCNF instead of DQBF.

In our paper we present a new, at first sight astonishing, but essentially simple theoretical insight into general DQCNFs, which enables transformations

<sup>?</sup> Supported by EPSRC grant EP/S015523/1

c The Author(s) 2023

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 372–390, 2023. https://doi.org/10.1007/978-3-031-30820-8 23

of problem instances, maintaining satisfiability-equivalence. We consider "global variables", universal variables on which every existential variable depends, and the corresponding "slice" of the CNF (the parts of the clauses using these variables). The main insight is that we can replace this global slice by any other global slice (using completely different variables and clauses), with the only condition that the conflict (clashing) patterns between global literals need to be maintained. These conflict patterns can be represented by bicliques in graphs, with one biclique corresponding to one variable with its positive and negative occurrences, establishing the two sides of the biclique (where all vertices from the two sides are connected). In this way the tools of the theory of biclique (edge) covers (and also biclique partitions) of graphs can be used to find "better" global slices. A natural first metric for "better" is to use fewer bicliques, and the corresponding decision problem, whether a graph has a biclique cover using at most a given number of bicliques, is the NP-complete Problem GT18 in the classical book [11]. The smallest number of bicliques needed to cover a graph is called the biclique cover number, or also the bipartite dimension. In our context there is a very natural alternative point of view of biclique-covers/partitions, namely representing bicliques by boolean variables in CNFs, and then instead of a biclique-cover we just have a CNF realising the graph, which means its conflict graph is the given graph; now "fewer bicliques" means "fewer variables". This has apparently been first explored in [18,10]. The potential applications of this new transformation (changing the global slice) are in preprocessing for solving, and also the proof complexity aspect seems very interesting — how much do such changes affect the complexity of the formula?

We now run through a simple example, which shows the main topic of the paper in a nutshell: Using graph theory connected to CNFs to lower the number of (certain) universal variables in a DQCNF.

#### 1.1 Using fewer universal variables

Consider the DQCNF F with four universal and two existential variables

$$F := \forall x\_1, x\_2, x\_3, x\_4 \exists y\_1 (x\_1, x\_2, x\_3) \; \exists y\_2 (x\_1, x\_2, x\_3, x\_4) : F,$$

where F := (y1∨x2∨x3)∧(¬y1∨x1∨¬x2)∧(¬y2∨¬x1∨¬x2∨¬x3∨x4)∧(y2∨¬x4). The universal variables of F are x1, x2, x3, x4, the existential variables are y1, y2, with their dependencies shown in brackets. F has a solution: y<sup>1</sup> = ¬x2, y<sup>2</sup> = x<sup>4</sup> (which makes all clauses tautologies). A central concept for this paper is that of a global variable, which is a universal variable such that all existential variables depend on it. The global variables of F are x1, x2, x3. The sub-clauses given by the global variables yield the global slice, which is denoted by gsl(F) (switching from logical to clause-notation — the global slice is just a CNF-clause-set):

$$\text{lgsl}(\mathbf{F}) = \left\{ \left\{ x\_2, x\_3 \right\}, \left\{ x\_1, \overline{x\_2} \right\}, \left\{ \overline{x\_1}, \overline{x\_2}, \overline{x\_3} \right\}, \emptyset \right\}.$$

The second central concept of this paper is the global conflict graph gcg(F), which is the conflict graph of the global slice: the clauses are the vertices, and an edge connects clauses iff they have clashing literals:

Note that indeed we have a graph, and there is only one edge between {x2, x3} and {x1, x2, x3} (not two). Now the basic insight of our paper (Corollary 2) is:

> Any clause-set realising the conflict-graph can be used instead of the (given) global slice.

Here by "realising" we just mean that the clause-set has the given conflict-graph. In our case, the triangle can be realised with just two variables x1, x2, yielding

This triangle-realisation is Horn, minimally unsatisfiable, with one clause more than variables (we will show that this is always available). We obtain the new DQCNF F 0 (which is satisfiability-equivalent to F, also shown for comparison):

$$F = \forall \boxed{x\_1, x\_2, x\_3} \exists\_4 \exists y\_1 (x\_1, x\_2, x\_3) \,\exists y\_2 (x\_1, x\_2, x\_3, x\_4) : F$$

$$F = (y\_1 \lor \underline{x\_2 \lor x\_3}) \land (\neg y\_1 \lor \underline{x\_1 \lor \neg x\_2}) \land (\neg y\_2 \lor \underline{\neg x\_1 \lor \neg x\_2 \lor \neg x\_3} \lor x\_4) \land (y\_2 \lor \neg x\_4)$$

$$F' := \forall \boxed{x\_1, x\_2, x\_3} \, x\_4 \exists y\_1 (x\_1, x\_2) \exists y\_2 (x\_1, x\_2, x\_4) : F'$$

$$F' := (y\_1 \lor \underline{x\_1}) \land (\neg y\_1 \lor \underline{\neg x\_1 \lor x\_2}) \land (\neg y\_2 \lor \underline{\neg x\_1 \lor \neg x\_2} \lor x\_4) \land (y\_2 \lor \neg x\_4),$$

where a solution now is y<sup>1</sup> = ¬x1, y<sup>2</sup> = x4. In general we are aiming at reducing the number of global variables, by using a smaller CNF-realisation of the global conflict graph. Since minimising the number of global variables is NP-hard, for this first study we only consider fixed predetermined replacement-schemes.

#### 1.2 Overview

In Section 2 we present basic definitions related to logic and graph theory. Especially the conflict graph of clause-sets is given in Definition 1, and in Subsection 2.2 we discuss biclique-covers/partitions, and how they relate to conflict graphs (Lemma 1). Section 3 then discusses the semantics of global variables in DQC-NFs. Theorem 1 spells out the basic fact that global variables can be expanded (they can be eliminated by considering all assignments to them), and that the results are captured by independent (clash-free) sets of the global conflict graph. In Definition 7 we make precise what it means that one DQCNF is obtained from another one by replacing the global slice with an equivalent one, namely having the same global conflict graph, and being the same after removal of the global slices. Corollary 2 then says that such DQCNFs are satisfiability-equivalent.

In Section 4 we study the most basic realisations, "precise" and "imprecise" ones, the former realising precisely the number of given parallel edges in a given multigraph. We start in Subsection 4.1 by using "full clause-sets", which are clause-sets where all clauses contain the same variables. So these are (imprecise) realisations of complete graphs, and indeed contain optimal ones (always w.r.t. the number of variables). In Subsection 4.2 we consider the trivial realisations, where every clash is realised by one new variable with one positive and one negative occurrence. A new perspective on basic realisations by "singular variables", which occur in one sign only once, is then presented in Subsection 4.3. In Lemma 2 we give a simple generation process for the class of Horn minimally unsatisfiable clause-sets (HMUs), and, exploiting this, in Theorem 2 we show that every graph has a precise realisation by HMUs, computable in linear time. In Corollary 4 we obtain that every DQCNF with m clauses can be transformed in linear time into a satisfiability-equivalent one with only the global slice changed, so that now there are at most m − 1 global variables, using for each connected component of the global conflict graph a (variable-disjoint) HMU.

We now come to the experimental part of the paper. In Section 5 we present the first instance of a general scheme for generating 2QCNF, which are DQCNFs of the form ∀X∃Y : F, where X is the set of global variables, and Y the set of existential variables. The general scheme starts with a graph G with m vertices, and chooses some realisation F of G. One chooses the number C ≥ 1 of connected components of the (overall) global conflict graph, consisting of C vertex-disjoint copies of G, realised by C variable-disjoint copies of F. This yields altogether C · m clauses. On these Cm clauses finally the existential slice is created, with n variables, which makes altogether three parameters (C, m, n). For the graphs G we choose complete graphs, and for the realisations the trivial realisation, the (unique) HMU realisation, and the (optimum) log (full) realisation, considering only powers of two: m = 2<sup>p</sup> . Finally for the existential slice we create random 3-CNFs. The basic question we want to explore is Hypothesis SIB: is using fewer global variables better for solving? We run two leading solvers on a selection of benchmark sets, which is presented in Section 6; see [19] for the benchmarks. To a large extend SIB is validated; we found only one parameter triple where the HMU-realisation could have some edge over the log-realisation, and present the finding. We conclude in Section 7 with future research directions.

#### 2 Preliminaries

#### 2.1 Logic

We have an infinite set of variables to start with; these variables can be used as universal or existential (boolean) variables in DQCNFs (see below), or just as plain (boolean) variables in clause-sets. We usually write v for a variable, using x for literals, with x the complement of a literal ("negation"). A clause C is a (finite) set of literals not containing clashing literals, that is, there is no x ∈ C with x ∈ C. Using L := {x : x ∈ L} for a set L of literals, clash-freeness of clauses C means the condition C ∩ C = ∅. A clause-set F is a finite set of clauses. We use var(x) for the underlying variable of a literal x, var(C) := {var(x) : x ∈ C} for the set of variables occurring (positively or negatively) in a clause C, and var(F) := S C∈F var(C) for the set of variables occurring in F. As measures for clause-sets F we use (taking values in N<sup>0</sup> = {x ∈ Z : x ≥ 0}):


Since in general we can not avoid having clauses with multiplicity, and we want to name clauses, we also use labelled clause-sets, which are pairs (L, F), where L is the (finite) set of (clause-)labels, and F is a map with domain L, mapping every label l ∈ L to a clause F(l). An ordinary clause-set F is converted into a labelled clause-set by using F as the label-set, and using the identity on F as clause-map. A DQCNF is a 4-tuple F = (A, E, F, D), where


A satisfying (total) assignment of F is a map Φ with domain E, where Φ(v) is a boolean function over the variables D(v), such that F after substitution via Φ becomes a tautology (over A), where F is understood as a CNF (a conjunction of clauses, where a clause is a disjunction of literals). A DQCNF F is satisfiable if it has a satisfying assignment, otherwise F is unsatisfiable. Two DQCNFs are satisfiability-equivalent if either both are satisfiable or both are unsatisfiable.

#### 2.2 Graphs

We use V 2 to denote the set of 2-element subsets of a set V . A graph is a pair (V, E), with V the (finite) vertex-set, and E ⊆ V 2 the edge-set (undirected, no parallel edges or (self-)loops). More generally, a multigraph is a pair (V, E), with V as before, while E : V 2 → N<sup>0</sup> maps every potential edge to its multiplicity (a natural number ≥ 0). An ordinary graph is converted into a multigraph by using the characteristic function of the edge-set. In the other direction, the underlying graph of a multigraph (V, E) has the edge {v, w} iff E({v, w}) ≥ 1. We use V (G) for the vertex-set of a (multi)graph G, and E(G) for the edge-set of a graph G resp. for the edge-function of a multigraph G. An independent set I ⊆ V (G) of a (multi)graph G has no edge e ∈ E(G) with e ⊆ I (resp. E(G)(e) ⊆ I). For the number of vertices we use |V (G)| ∈ N0, while for the number of edges we use |E(G)| ∈ N0, which for a multigraph G is defined as |E(G)| := P <sup>e</sup><sup>∈</sup>( V (G) <sup>2</sup> ) E(G)(e), that is, as the sum of edge-multiplicities. K<sup>n</sup> is the complete graph with n ∈ N<sup>0</sup> vertices, that is, V (Kn) = {1, . . . , n} and E(G) = V (G) 2 (thus |E(Kn)| = 1 2 n(n − 1)).

Definition 1. Consider a labelled clause-set (L, F). The conflict multigraph cmg(F) is the multigraph with vertex-set L, where the multiplicity of an edge {a, b} (for labels a, b ∈ L) is |F(a)∩F(b)|, that is, the number of clashing literals between the clauses of a and b. The conflict graph cg(F) is the underlying graph of cmg(F). A labelled clause-set (L, F) precisely-realises a multigraph G, if cmg(L, F) = G, and realises a graph G, if cg(L, F) = G.

We write "precisely-realise" instead of "precisely realise" to avoid grammatical ambiguity (as in "that precisely realises what I want").

A biclique in a multigraph G is a pair (A, B) of disjoint vertex sets A, B ⊆ V (G), such that all a ∈ A are adjacent with all b ∈ B. The corresponding characteristic function maps exactly the edges {a, b} to 1 (all other edges to zero). A biclique partition of G is a family ((A<sup>i</sup> , Bi))i∈<sup>I</sup> of bicliques in G, such that the sum of characteristic functions equals the edge-function of G, while for a biclique cover of G that sum needs to be equal zero exactly for the nonedges. For graphs G a biclique represents the corresponding set of edges of G, and a biclique partition yields a partitioning of the edge-set, while a biclique cover has as its union the edge-set. For (multi)graphs G by bcp(G) ∈ N<sup>0</sup> resp. bcc(G) ∈ N<sup>0</sup> the minimum number of bicliques in a biclique partition resp. cover of G is denoted. For an overview on the complexity of computing bcp(G) and bcc(G) see [9,4,7]. That boolean clause-sets yield a natural environment for biclique partitions (and covers) was apparently first realised in [18]:

Lemma 1. For a multigraph G the biclique partitions resp. biclique covers correspond, up to handling of degenerations, to precise-realisation resp. realisations of G by labelled clause-sets (Definition 1), with the bicliques corresponding to the variables and their positive and negative occurrences. bcp(G) is the minimal number of variables in a precise-realisation of G, while bcc(G) is the minimal number of variables in a realisation of G.

We are mostly interested in (imprecise-)realisations, since we are interested in using realisations F with as few variables as possible (i.e., minimising n(F), which is equivalent to maximising δ(F)). However also precise-realisations can be of interest, since they are smaller in regards to the number of literal occurrences.

With the example from Subsection 1.1 we have already seen two different realisations of the triangle K<sup>3</sup> (thus using the label-set {1, 2, 3}), namely first using three variables in 1 7→ {x2, x3}, 2 7→ {x1, x2}, 3 7→ {x1, x2, x3} , corresponding to the biclique cover by the three bicliques ({2}, {3}),({1}, {2, 3}),({1}, {3}), and second using two variables in 1 7→ {x1}, 2 7→ {x1, x2}, 3 7→ {x1, x2} , corresponding to the biclique cover by the two bicliques ({1}, {2, 3}),({2}, {3}). The latter is a precise-realisation (the cover is a partition).

### 3 The global conflict graph

We now study the simplest type of universal variables of a DQCNF, called "global variables", which are the variables every existential variable depends on. In the final result, Corollary 2, we will see that concerning satisfiability (at all), all what matters about global variables is the clashes they create between the clauses.

Definition 2. A global variable of a DQCNF F = (A, E, F, D) is a universal variable, such that every existential variable depends on it. We denote the set of all global variables by gvar(F) := {v ∈ A : ∀ w ∈ E : v ∈ D(w)}.

We note that the notion of a global variable does not depend on the clauses. A DQCNF might not have any global variable. For a 2QCNF the global variables are all the universal variables, i.e., gvar(A, E, F, D) = A (that is indeed the definition of 2QCNF). In order to access the clause-parts with global literals, we consider a DQCNF as "sliced up" by their variable-sets, for example for a QCNF ∃X∀Y ∃Z : F we have three natural slices, for X, Y, Z.

Definition 3. For a DQCNF F = (A, E, F, D) and some set V ⊆ A ∪ E of variables, the V -slice is the labelled clause-set (F, F<sup>V</sup> ) (using the clauses of F as labels), such that the clause of label C ∈ F is F<sup>V</sup> (C) := C[V ] := {x ∈ C : var(x) ∈ V }. The global slice of F is the gvar(F)-slice, denoted by gsl(F).

Combining Definition 1, 2, and 3, we obtain the "global conflict graph" as the conflict graph of the global slice:

#### Definition 4. For a DQCNF F = (A, E, F, D) the global conflict graph resp. multigraph is gcg(F) := cg(gsl(F)) resp. gcmg(F) := cmg(gsl(F)).

The vertices of the global conflict (multi)graph are the clauses, with the edges corresponding to clashes between literals over global variables. Note that the realisations of the global conflict graph are the same as the realisations of the global conflict multigraph (for realisations, multiplicities of edges are irrelevant).

We need the ability to remove the global variables (obtaining another DQCNF), for which we introduce the following notation:

Definition 5. For a DQCNF F = (A, E, F, D) let V := gvar(F) be the set of global variables, while V 0 := (A ∪ E) \ V is the set of other variables. We define

$$\text{mgvar}(\mathbf{F}) := (A \nmid V, E, \{C - V\}\_{C \in F}, (D(v) \nmid V)\_{v \in E}),$$

with "m" for "minus", which is the DQCNF obtained by removing the global variables from its universal variables (removing all literals with underlying global variable). Here C − V := C[V 0 ] (removing all literals with variables from V ).

The semantic contribution of global variables is captured by the global-clashfree sub-clause-sets and their related sub-DQCNFs:

Definition 6. Consider a DQCNF F = (A, E, F, D). A globally-independent sub-clause-set of F is a clause-set F <sup>0</sup> ⊆ F which is an independent subset of gcg(F) (that is, the global variables of F are all pure variables, appearing only in one sign, in F 0 ). A globally-independent sub-DQCNF is some mgvar(A, E, F<sup>0</sup> , D) for some globally-independent sub-clause-set F 0 . Speaking of maximal globally-independent, we restrict the F <sup>0</sup> ⊆ F to maximal independent subsets of gcg(F). The set of all maximally globally-independent sub-DQCNFs is denoted by gind(F), and two DQCNF's F,F <sup>0</sup> are called gindequivalent if gind(F) = gind(F 0 ).

We note that two gind-equivalent DQCNFs have the same existential variables, and that gind-equivalence is indeed an equivalence relation. We now come in Theorem 1 to the basic observation about the role played by global literals (literals whose underlying variables are global). Most basic is the insight that global variables are exactly the variables which always allow reducing the problem by substituting all possible truth values, which we illustrate by a simple example:

Example 1. Let A := {a}, E := {x}, F := {{a, x}, {a, x}}, D<sup>1</sup> := (x 7→ A), D<sup>2</sup> := (x 7→ ∅), and finally F<sup>i</sup> := (A, E, F, Di) for i = 1, 2. Less formally, we have two QCNFs: F<sup>1</sup> , ∀a∃x : F and F<sup>2</sup> , ∃x∀a : F, where F , a ↔ x. Obviously F<sup>1</sup> is satisfiable, with the unique solution x , a, while F<sup>2</sup> is unsatisfiable.

We have gvar(F1) = {a}, while gvar(F2) = ∅. Substituting a 7→ 0 into F<sup>1</sup> or F<sup>2</sup> yields in both cases the DQCNF G<sup>0</sup> = ∃x : ¬x, while a 7→ 1 yields G<sup>1</sup> = ∃x : x. G<sup>ε</sup> has the unique solution x , ε for ε ∈ {0, 1}. For F<sup>1</sup> we are then able to get a solution for x, since x depends on a, and thus we can select the appropriate solution from Gε, depending on the value ε. While x does not depend on a in F2, and thus we could only lift the solutions from G0,<sup>1</sup> to F<sup>2</sup> if they would be the same in both cases.

The vertices of the global conflict graphs of F1,F<sup>2</sup> are the two clauses, which in F<sup>1</sup> are connected by an edge, while in F<sup>2</sup> they are isolated. So gind(F1) , {∃x : ¬x, ∃x : x}, while gind(F2) = {F2}.

Theorem 1. A DQCNF F = (A, E, F, D) is unsatisfiable iff there is some unsatisfiable maximal globally-independent sub-DQCNF of F.

Proof. We show the equivalent statement: F is satisfiable iff all maximal globallyindependent sub-DQCNFs are satisfiable.

Let V := gvar(F). F is satisfiable iff for all boolean total assignments ϕ : V → {0, 1}, after substitution of ϕ into F, the resulting DQCNF ϕ ∗ F := (A\V, E, ϕ∗F,(D(v)\V )v∈E) is satisfiable, where ϕ∗F is the usual application of a partial assignment to a clause-set (removing all satisfied clauses, and removing the falsified literals from the remaining clauses): The direction from left to right holds for all partial assignments to universal variables, while the direction from right to left uses that the variables in V are global, and thus the boolean functions used in a satisfying assignment of a DQCNF can be made dependent on them. Now the clauses of ϕ ∗ F come from an independent subset of gcg(F), since an edge, that is a clash, would cause one of the two clauses involved to be satisfied. And for every maximal independent subset F <sup>0</sup> we can find ϕ : V → {0, 1} satisfying exactly all clauses in F \ F 0 , by setting all global literals occurring in F 0 to 1. Thus the maximal independent F <sup>0</sup> ⊆ F cover exactly the relevant (maximal) cases of ϕ ∗ F, which shows the assertion. ut

Thus F is unsatisfiable iff gind(F) contains an unsatisfiable element:

Corollary 1. Gind-equivalence implies sat-equivalence, that is, if for DQCNF F,F <sup>0</sup> holds gind(F) = gind(F 0 ), then F is satisfiable iff F 0 is satisfiable.

A sufficient condition for F,F <sup>0</sup> being gind-equivalent is that F 0 is obtained from F by replacing the global slice in such a way that the global conflict graph is maintained. The precise concept is captured by "global-conflict-graphequivalence":

Definition 7. Two DQCNFs F = (A, E, F, D), F <sup>0</sup> = (A<sup>0</sup> , E<sup>0</sup> , F<sup>0</sup> , D<sup>0</sup> ) are gcgequivalent if the following conditions hold:


The first condition of Definition 7 says that after removal of the global variables, we have exactly the same DQCNFs, while the second condition says that the global literals inserted into the clauses of mgvar(F) = mgvar(F 0 ) yield exactly the same conflict-pattern (and thus the same independent subsets).

Corollary 2. Gcg-equivalence implies gind-equivalence. Thus two gcg-equivalent DQCNFs are sat-equivalent.

In the following Section 4 we will consider the problem of constructing gcgequivalences. This is just a study of graphs G and their (CNF-)realisations, since all what matters here is the global slice of a DQCNF, which is just a boolean CNF. Furthermore we only need to consider connected graphs, since every connected component of G can be handled separately.

### 4 Realisations

We now introduce the three most basic classes of realisations of multigraphs:


These three representation-classes are based on three classes of clause-sets:

Definition 8. For a clause-set F we introduce the following special cases of variables v ∈ var(F):


A clause-set having only full resp. 1-singular resp. singular variables is a full resp. totally 1-singular resp. totally singular clause-set.

#### 4.1 Full clause-sets

Any full clause-set F realises the complete graph Kc(<sup>F</sup> ) with c(F) many vertices. Indeed the realisations of the complete graphs are exactly the hitting clause-sets (any two clauses clash; as DNFs also known as orthogonal or disjoint DNFs), and we will see in Corollary 5 another class of hitting clause-sets. For a complete graph K<sup>m</sup> with m ∈ N vertices, it is well-known ([8]) that bcc(Km) = dlg(m)e, where lg(m) is the binary logarithm of m. Such optimal realisations F are obtained from the canonical (full) clause-set with n := dlg(m)e many variables and 2 <sup>n</sup> clauses by selecting any m clauses.

In contrast to this we have the Theorem of Graham-Pollak ([15]), which states bcp(Km) = m − 1. Thus there exists a precise-realisation of K<sup>m</sup> with deficiency 1 (which is optimal among precise-realisations), and in the already mentioned Corollary 5 we will see an example for that (the simplest example). More generally, in Subsection 4.2 we will indeed see that every nonempty connected graph has a minimally unsatisfiable precise-realisation F with δ(F) = 1. We note that the above optimal logarithmic realisations by full clause-sets are minimally unsatisfiable iff m is a power of two (otherwise they are satisfiable); this could be repaired by removal of literal occurrences for the non-powers of two, but we have to leave this refinement to future work, and in this paper we only consider the cases m = 2<sup>n</sup>.

#### 4.2 Totally 1-singular clause-sets

Obviously, every multigraph G can be precisely-realised by a totally 1-singular clause-set F with n(F) = |E(G)|. For a connected G, these precise-realisations are minimally unsatisfiable iff G is a tree; these are exactly the marginal minimally unsatisfiable clause-sets of deficiency 1 (see Corollary 6). Otherwise they are satisfiable.

#### 4.3 Totally singular MUs

It is well-known that every connected graph G with m := |E(G)| ≥ 1 vertices has a claw-decomposition with m − 1 claws, and thus bcp(G) ≤ m − 1. Here a "claw" is a special biclique, with one side having exactly one vertex. The proof uses a elimination-sequence v1, . . . , vm−<sup>1</sup> of G, which removes one vertex after the other (including incident edges) such that always a (nonempty) connected graph is maintained. That is, G<sup>0</sup> := G, while G<sup>i</sup> := Gi−<sup>1</sup> − v<sup>i</sup> for i = 0, . . . , m − 1: the defining property of "elimination-sequence" is that each G<sup>i</sup> is connected (and nonempty — eliminating the last vertex would yield a superfluous claw, with one side being empty). Here for a graph G and a vertex v ∈ V (G) we define G − v := (V (G) \ {v}, {e ∈ E(G) : v /∈ e}).

The existence of an elimination-sequence, and computing it in linear time in the size of the graph, can be accomplished as follows (this is well-known, see e.g. [6, Proposition 1.4.1], but for completeness we discuss it here):


A claw in a biclique-partition is a singular variable in the corresponding realisation. Thus we obtain that G has a totally singular realisation F with m − 1 variables (and m clauses, thus of deficiency 1). Now indeed the F constructed in this way are exactly the minimally unsatisfiable Horn clause-sets, and this correspondence, based on unit-clause propagation, we discuss in this subsection.

It is useful to introduce the following three classes of clause-sets:


For a general overview on minimally unsatisfiable formulas see [17], while [5, Corollary 10] seems the first source for the fact that HMUs have deficiency 1.

Lemma 2. The class HMU is generated by the following process (each step called a singular positive unit-extension):


Proof. It is easy to see that all generated clause-sets are elements of HMU. It remains to show that all F ∈ HMU can be generated; we show this by induction on n(F). For n(F) = 0 we have F = {⊥}, which is the base case of the generation process. So assume n(F) > 0. F must contain a positive unit-clause {v} (otherwise every clause would contain a negative literal, due to the Hornproperty, and then setting all literals to 0 would be a satisfying assignment). Due to F being minimally unsatisfiable, there is no other clause than {v} containing the positive literal v. Now setting v to 1 produces F <sup>0</sup> ∈ HMU, where we can apply the induction hypothesis to F 0 , and from F <sup>0</sup> by one step of singular positive unit-extension with v we obtain F. ut

The following four properties of HMUs follow all easily from the generation process of Lemma 2 by induction:

Corollary 3. Clause-sets F ∈ HMU have the following properties:


So HMUs precisely-realise nonempty connected graphs, and indeed they realise exactly those:

Theorem 2. For every connected nonempty (finite) graph G one can construct in linear time (in the length of G, i.e., in |V (G)| + |E(G)|) an HMU preciselyrealising G.

Proof. For G compute an elimination-sequence v1, . . . , vm−<sup>1</sup> as explained at the beginning of the subsection, and use these vertices as variables for the generation process according to Lemma 2, where F<sup>0</sup> is the set of neighbours. ut

The novelty of Theorem 2 from the graph-theoretical perspective lies in relating biclique partitions by claws with realisations by HMUs (note that realisation by any totally singular clause-set of deficiency 1 is trivial). The restriction to graphs (without parallel edges) is natural here, since our main interest is in imprecise-realisations (using as few variables as possible). A related result here for precise-realisations of multigraphs is given in [27], where it is shown (in graph-theoretical language), that for every graph G the multigraph G<sup>0</sup> with V (G) = V (G<sup>0</sup> ), which has as many edges between vertices as is given by their distance in G, has a precise-realisation F with δ(F) ≥ 1; such an F yields a so-called "addressing" of G.

Corollary 4. For each DQCNF F there is a gcg-equivalent F 0 such that the global slice of F 0 is a variable-disjoint union of HMUs.

Recall that a "hitting clause-set" is a clause-set F such that every two (different) clauses clash; full clause-sets are a special case. In other words, hitting clause-sets are exactly the realisations of complete graphs.

Corollary 5. The hitting HMUs are exactly those where for each singular unitextension step F<sup>0</sup> = F holds. For every n ∈ N<sup>0</sup> there is up to isomorphism exactly one such clause-set, called Sn, with n(Sn) = n.

Corollary 6. The totally 1-singular HMUs are exactly those where for each singular unit-extension step |F0| = 1 holds; they precisely-realise exactly all trees.

### 5 A basic generator

The main target of this first experimental evaluation is the validation or refutation of the Hypothesis SIB: "Small Is Better" — the smaller the number of variables in the realisation, the easier to solve.

First, to generate test-instances, we take the simplest approach for our generator, focusing on generating 2QCNFs. For 2QCNFs, the global variables are all the universal variables (for more information, see Section 3), and thus the universal slice is the same as the global slice. The following is an example of 2QCNF in the standard QDIMACS form, with 6 variables and 4 clauses:


For the presentation, existential literals precede the universal literals, using a separator "|". The existential slice is {{−1, −3, 4}, {−1, −2, 3}, {1, −2, 5}, {1, 3, −4}}, while the universal (global) slice is {{5}, {−5}, {6}, {−6}}. In Section 3 we considered the case of connected graphs. Real world instances have indeed a large number of connected (global) components, and so we are using C ∈ N many components. Altogether the parameters (C, p, n) specify the generated 2QCNF, where p ∈ N is the (binary log of the) number of vertices in a component, and n ∈ N is the total number of existential variables.

For the component-conflict-graph of the universal slice, we use the complete graph with m := 2<sup>p</sup> vertices (clauses), and q := <sup>1</sup> <sup>2</sup>m(m − 1) edges — this is the simplest case where we have an exponential separation between the optimum realisation and the HMU-realisation. So the total number of generated clauses is C · m. In the above QDIMACS we have C = 2 and p = 1 (the smallest value to obtain a proper 2QCNF), thus q = 1. For the existential slice we choose a random 3-CNF with n variables and C · m clauses; note that components of the conflict graph do not play any role here. We use the three realisation from Section 4 (for each component, with m clauses):


Example 2. Below we display a generated 2QCNF for all three realisations, with (C, p, n) = (2, 2, 4). Thus m = 2<sup>2</sup> = 4 clauses per component, making 2 · 4 = 8 clauses in total. The existential slice is a uniform random 3-CNF with 4 variables and 8 clauses. For each component, the trivial realisation uses q = 1 2 · 4 · 3 = 6 variables, HMU uses 3 variables, and log uses 2 variables. Leftmost is the trivial realisation, then the HMU realisation, and finally the logarithmic realisation:


### 6 Experimental results

We use two top-performing 2QCNF solvers, DepQBF [20] and CADET [23], based on the QBFEVAL 2020 competition results [22]. In order to avoid the known high variability on satisfiable instances, for this first experimental evaluation we only considered unsatisfiable instances (throwing away satisfiable instances). Recall that we use parameter values (C, p, n) according to Section 5. For each parameter value, we generated 1000 instances and report the results only on the unsatisfiable instances. In general we tried to select values such that the created benchmarks are of medium hardness, around at most one hour, considering all three realisations (trivial, HMU, logarithmic). Now it turned out that the trivial realisation caused mostly very hard instances, and so our selection process focuses on HMU and logarithmic realisations. We found in general Hypothesis SIB ("small is better") well validated: On all parameters considered, both solvers solved more instances with the logarithmic realisation and had a better average runtime than with the HMU-realisation. All the experiments were conducted on Intel(R) Xeon(R) E5-2620 v4 @ 2.10GHz CPUs with a time limit of 3600s and memory limit of 8 GB per instance. Memory usage for instance generation and solving processes of generated benchmarks was minimal (< 1 GB). The summary of the results is as follows, using rounded runtimes:


The solver column labelled with "D" refers to DepQBF, while "C" is CADET. The mean and median consider only instances solved by the corresponding solver.

For example for Row 1D, the HMU-mean 280 as well as the HMU-median 82 relates only to the runtimes on the 978 HMU-instances solved by DepQBF.

The table shows that the Hypothesis SIB is mostly validated for the 2·6 = 12 rows (only comparing HMU and log now). First there are 6 fully conforming rows, namely 1DC, 2D, 3D, 5D, and 6C, where more instances were solved for the logrealisation, and this also with better mean and median times. Then there are 4 mostly conforming rows 2C, 3C, 4D, and 5C, where we have also clearly more solved log-realisations, while mean or median could be better for HMU, but only for a small number of instances. This leaves two exceptional rows: 4C and 6D. We need to leave 4C for more extensive experimentation: these instances were very hard for CADET, and the number of solved instances is too small for a statistical analysis. For the 6D instances with (C, p, n) = (14, 4, 11), the median solving time for the HMU realisation is better (92s versus 172s), and it solves nearly as many instances as the logarithmic realisation. However, the average solving time for the HMU realisation is worse than for the logarithmic realisation (301s versus 294s; timeouts are not included in these averages). This warrants further investigation, and the density plot (the second plot shows times ≤ 1000s only) can provide additional insight:

The mean values shown in the plots now include the 3600s timeout, which for HMU increases the mean to 397s. The second plot, which shows times ≤ 1000s, reveals that the HMU realisation solves several instances faster than the logarithmic realisation, but its performance deteriorates over time, with fewer and fewer instances solved. The first plot, which shows the overall picture, shows a spike for the HMU realisation at times ≤ 3600s at the tail end, indicating that 29 instances timed out (while the logarithmic realisation solved all instances). When the timeout is increased from 3600s to 18000s, the mean of the HMU realisation increases to 445s.

On these instances we could devise a portfolio strategy in which both HMU and logarithmic realisation instances run in parallel, while aborting HMU realisation relatively quickly — in this way one could achieve a faster average solving time overall. While this parameter triple is interesting, more investigation is required to understand the precise causes of this behaviour.

#### 7 Conclusion and Outlook

We have introduced the global conflict graph of DQCNFs, which represents the clashes (conflicts) between global literals; for 2QCNFs the global literals are just the universal literals. We have shown that the corresponding global slice can be replaced by anything else which just reproduces the conflict graph. We then switched to investigating (CNF-)realisations of arbitrary graphs, concentrating on the three most basic classes, given by full clause-sets (complete graphs only), by variables occurring only twice, and by HMUs (Horn minimally unsatisfiable clause-sets). For the latter we showed that they can realise everything, and thus yield the upper bound m − k on the number of global variables needed for any DQCNF with m clauses and k connected components of the conflict graph; such a transformation can be computed in linear time. We created then families of 2QCNF instances, with a relatively small number of connected components, and consisting of small complete graphs; together with any of the three basic realisations (full-log, trivial, HMU) this creates the universal slice, while the existential slice is given by a random 3-CNF. We investigated whether indeed in this setting fewer universal variables mean easier solving, and found that in general well supported. There are many future avenues for research and practice:


Of course, insights into the behaviour of solvers is an important goal here.

On the theory side, a fundamental question here is to investigate which restricted classes of global conflict graphs still yield completeness for the respective complexity classes. Finally it seems natural to conjecture that allowing arbitrary transformations of the global slice can have a huge influence on various complexity issues, like proof-length in various calculi, and the complexities of strategy extraction.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Certificates for Probabilistic Pushdown Automata via Optimistic Value Iteration

Tobias Winkler() and Joost-Pieter Katoen

RWTH Aachen University, Aachen, Germany {tobias.winkler,katoen}@cs.rwth-aachen.de

Abstract. Probabilistic pushdown automata (pPDA) are a standard model for discrete probabilistic programs with procedures and recursion. In pPDA, many quantitative properties are characterized as least fixpoints of polynomial equation systems. In this paper, we study the problem of certifying that these quantities lie within certain bounds. To this end, we first characterize the polynomial systems that admit easy-to-check certificates for validating bounds on their least fixpoint. Second, we present a sound and complete Optimistic Value Iteration algorithm for computing such certificates. Third, we show how certificates for polynomial systems can be transferred to certificates for various quantitative pPDA properties. Experiments demonstrate that our algorithm computes succinct certificates for several intricate example programs as well as stochastic context-free grammars with > 10<sup>4</sup> production rules.

Keywords: Probabilistic Pushdown Automata · Probabilistic Model Checking · Certified Algorithms · Probabilistic Recursive Programs.

### 1 Introduction

Complex software is likely to contain bugs. This applies in particular to model checking tools. This is a serious problem, as the possibility of such bugs compromises the trust one can put in the verification results, rendering the process of formal modeling and analysis less useful. Ideally, the implementation of a model checker should be formally verified itself [15]. However, due to the great complexity of these tools, this is often out of reach in practice. Certifying algorithms [31] mitigate this problem by providing an easy-to-check certificate along with their regular output. This means that there exists a verifier that, given the input problem, the output, and the certificate, constructs a formal proof that the output is indeed correct. The idea is that the verifier is much simpler than the algorithm, and thus likely to be bug-free or even amenable to formal verification.

This paper extends the recent line of research on probabilistic certification [19,23,24,41] to probabilistic pushdown automata [13,30] (pPDA). pPDA and related models have applications in, amongst others, pattern recognition [39],

c The Author(s) 2023

This work is supported by the DFG research training group 2236 UnRAVeL and the ERC advanced research grant 787914 FRAPPANT.

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 391–409, 2023. https://doi.org/10.1007/978-3-031-30820-8\_24

Fig. 1: Left: A stochastic context-free grammar (SCFG; e.g. [16]) and the associated positive polynomial system (PPS) which encodes the termination probabilities of each non-terminal, assuming production rules are taken uniformly at random. Right: The curves defined by the two equations. The least fixpoint (lfp) is ≈ (0.66, 0.70). The thin colored area to the top right of the lfp is the set of inductive, self-certifying upper bounds on the lfp.

computational biology [28], and speech recognition [25]. They are moreover a natural operational model for programs with procedures, recursion, and (discrete) probabilistic constructs such as the ability to flip coins. With the advent of probabilistic programming [32] as a paradigm for model-based machine learning [6], such programs have received lots of attention recently. Moreover, several efficient algorithms such as Hoare's quicksort with randomized pivot selection (e.g. [26]) are readily encoded as probabilistic recursive programs.

A pPDA can be seen as a purely probabilistic variant of a standard pushdown automaton: Instead of reading an input word, it takes its transitions randomly based on fixed probability distributions over successor states. Quantities of interest in pPDA include reachability probabilities [13], expected runtimes [8], variances [14], satisfaction probabilities of temporal logic formulas [47,42], and others (see [7] for an overview). pPDA are equivalent to Recursive Markov Chains [17]. In the past two decades there have been significant research efforts on efficient approximative algorithms for pPDA, especially a decomposed variant of Newton iteration [16,27,11,17,12,10,40] which provides guaranteed lower, and occasionally upper [10,12] bounds on key quantities. However, even though implementations might be complex [46], these algorithms are not certifying.

Our technique for certificate generation is a non-trivial extension of Optimistic Value Iteration [22] (OVI) to pPDA. In a nutshell, the idea of OVI is to compute some lower bound l on the solution—which can be done using an approximative iterative algorithm—and then optimistically guess an upper bound u = l + ε and verify that the guess was correct. Prior to our paper, OVI had only been considered in Markov Decision Processes (MDP) [22] and Stochastic Games (SG) [1], where it is used to compute bounds on, e.g., maximal reachability probabilities. The upper bounds computed by OVI have a special property: They are self-certifying (also called inductive in our paper): Given the system and the bounds, one can check very easily that the bounds are indeed correct.

However, pPDA are much more complicated than MDP or SG for the following reasons: (i) pPDA may induce infinite-state Markov processes due to their unbounded stack; (ii) the analysis of pPDA requires solving non-linear equations; (iii) the complexity of basic decision problems is generally higher than in MDP/SG. For example, reachability in MDP is characterized as the least fixpoint (lfp) of a piece-wise linear function that can be computed in PTIME via, e.g., LP solving. On the other hand, reachability in pPDA requires computing a fixed point of a positive polynomial function, leading to a PSPACE complexity bound [13]. See Figure 1 for an example.

Contributions. Despite the difficulties mentioned above, we show in this paper that the general idea of OVI can be extended to pPDA, yielding a practically feasible algorithm with good theoretical properties. More concretely:

Contribution 1 We present an OVI-style algorithm for computing inductive upper bounds of any desired precision ε > 0 on the lfp of a positive polynomial system. Compared to the existing OVI [22], the key novelty of our algorithm is to compute a certain direction v in which to guess, i.e., the guess is u = l + εv rather than u = l + ε. The direction v is an estimate of a certain eigenvector. This ensures that we eventually hit an inductive bound, even if the latter lie in a very "thin strip" as in Figure 1, and yields a provably complete algorithm that is guaranteed to find an inductive bound in finite time (under mild assumptions).

Contribution 2 We implement our algorithm in the software tool pray and compare the new technique to an out-of-the-box approach based on SMT solving, as well as to standard OVI with a simpler guessing heuristic.

Related Work. Certification of pPDA has not yet been addressed explicitly, but some existing technical results go in a similar direction. For instance, [17, Prop. 8.7] yields certificates for non-termination in SCFG, but they require an SCC decomposition for verification. Farkas certificates for MDP [19] are more closely related to our idea of certificates. They require checking a set of linear constraints. A symbolic approach to verify probabilistic recursive programs on the syntax level including inductive proof rules for upper bounds was studied in [35]. A higher-order generalization of pPDA was introduced in [29], and an algorithm for finding upper bounds inspired by the Finite Element method was proposed. Applications of PPS beyond the analysis of pPDA include the recent factor graph grammars [9] as well as obtaining approximate counting formulas for many classes of trees in the framework of analytic combinatorics [18]. Regarding software tools, PReMo [46] implements iterative algorithms for lower bounds in Recursive Markov Chains, but it supports neither certificates nor upper bounds.

Paper Outline. We review the relevant background information on PPS in Section 2. Section 3 presents our theoretical results on inductive upper bounds in PPS as well as the new Optimistic Value Iteration algorithm. In Section 4 we explain how inductive bounds in PPS are used to certify quantitative properties of pPPA. The experimental evaluation is in Section 5. We conclude in Section 6. A full version of this paper is available online [44].

### 2 Preliminaries

Notation for Vectors. All vectors in this paper are column vectors and are written in boldface, e.g., u = (u1, . . . , un) T . For vectors u,u 0 , we write u ≤ u 0 if u is component-wise less than or equal to u 0 . Moreover, we write u < u 0 if u ≤ u 0 and u 6= u 0 , and u ≺ u 0 if u is component-wise strictly smaller than u 0 . The zero vector is denoted 0. The max norm of a vector u is ||u||<sup>∞</sup> = max1≤i≤<sup>n</sup> |u<sup>i</sup> |. We say that u is normalized if ||u||<sup>∞</sup> = 1.

Positive Polynomial Systems (PPS). Let n ≥ 1 and x = (x1, . . . , xn) <sup>T</sup> be a vector of variables. An n-dimensional PPS is an equation system of the form

$$x\_1 = f\_1(x\_1, \dots, x\_n) \quad \dots \quad x\_n = f\_n(x\_1, \dots, x\_n)$$

where for all 1 ≤ i ≤ n, the function f<sup>i</sup> is a polynomial with non-negative real coefficients. An example PPS is the system x = 1 2 (1+xy<sup>2</sup> ), y = 1 3 (1+x+y 2 ) from Figure 1. We also use vector notation for PPS: x = f(x) = (f1(x), . . . , fn(x))<sup>T</sup> .

We write R<sup>≥</sup><sup>0</sup> = R<sup>≥</sup><sup>0</sup> ∪ {∞} for the extended non-negative reals. By convention, for all a ∈ R<sup>≥</sup>0, a ≤ ∞, a + ∞ = ∞ + a = ∞, and a · ∞ = ∞ · a equals 0 if a = 0 and ∞ otherwise. For n ≥ 1, the partial order (R n ≥0 , ≤) is a complete lattice, i.e., all subsets of R n <sup>≥</sup><sup>0</sup> have an infimum and a supremum. In particular, there exists a least element 0 and a greatest element ∞ = (∞, . . . , ∞) T . Every PPS induces a monotone function f : R n <sup>≥</sup><sup>0</sup> → R n ≥0 , i.e., u ≤ v =⇒ f(u) ≤ f(v). By the Knaster-Tarski fixpoint theorem, the set of fixpoints of f is also a complete lattice, and thus there exists a least fixpoint (lfp) denoted by µf.

In general, the lfp µf is a vector which may contain ∞ as an entry. For instance, this happens in the PPS x = x + 1. A PPS f is called feasible if µf ≺ ∞ (or equivalently, µf ∈ R n ≥0 ). The Knaster-Tarski theorem also implies:

#### Lemma 1 (Inductive upper bounds). For all u ∈ R n ≥0 it holds that

$$f(u) \le u \quad implies \quad \mu f \le u \text{ .}$$

Such a vector u with u ≺ ∞ is called inductive upper bound.

If f is feasible, then µf is obviously an inductive upper bound. The problem is that µf may be irrational even if f has rational coefficients only (see Example 1 below) and can thus not easily be represented exactly. In Section 3 we show under which conditions there exist rational inductive upper bounds u ∈ Q<sup>n</sup> ≥0 .

Given a feasible PPS f, find a rational inductive upper bound u ≥ µf. Problem statement of this paper

A PPS is called clean if µf 0. Every PPS can be cleaned in linear time by identifying and removing the variables that are assigned 0 in the lfp [17,12].

Given a PPS f and a point u ∈ R n ≥0 , we define the Jacobi matrix of f at u as the n×n-matrix f 0 (u) with coefficients f 0 (u)1≤i,j≤<sup>n</sup> = ∂ ∂x<sup>j</sup> fi(u).

Example 1. Consider the example PPS fex with variables x = (x, y) T :

$$x = f\_1(x, y) = y + 0.1 \qquad y = f\_2(x, y) = 0.2x^2 + 0.8xy + 0.1 \qquad \dots$$

The line and the hyperbola defined by these equations are depicted in Figure 2 on Page 7. The fixpoints of fex are the intersections of these geometric objects; in this case there are two. In particular, fex is feasible and its lfp is

$$\begin{array}{rcl} \mu f\_{ex} & = \end{array} \left( (27 - \sqrt{229})/50, (22 - \sqrt{229})/50 \right)^T \approx \text{ (0.237, 0.137)^T .}$$

Therefore, fex is clean as µfex 0. The Jacobi matrix of fex is

$$\begin{array}{rcl}f'\_{ex}(x,y) &=& \begin{pmatrix} \frac{\partial}{\partial x}f\_1 & \frac{\partial}{\partial y}f\_1\\ \frac{\partial}{\partial x}f\_2 & \frac{\partial}{\partial y}f\_2 \end{pmatrix} \\ &=& \begin{pmatrix} 0 & 1\\ 0.4x + 0.8y & 0.8x \end{pmatrix} \end{array}$$

Note that the lfp µfex contains irrational numbers. In the above example, these irrational numbers could still be represented using square roots because the fixpoints of fex are the zeros of a quadratic polynomial. However, there are PPS whose lfp cannot be expressed using radicals, i.e., square roots and cubic roots, etc. [16]. This means that in general, there is no easy way to compute the lfp exactly. It is thus desirable to provide bounds, which we do in this paper. 4

Matrices and Eigenvectors. Let M be a real n×n-matrix. We say that M is non-negative (in symbols: M ≥ 0) if it has no negative entries. M is called irreducible if for all 1 ≤ i, j ≤ n there exists 0 ≤ k < n such that (M<sup>k</sup> )i,j 6= 0. It is known that M is irreducible iff the directed graph G<sup>M</sup> = ({1, . . . , n}, E) with (i, j) ∈ E iff Mi,j 6= 0 is strongly connected. A maximal irreducible submatrix of M is a square submatrix induced by a strongly connected component of GM. The period of a strongly connected M is the length of the shortest cycle in GM. It is instructive to note that PPS x = f(x) are generalizations of linear equation systems of the form x = Mx + c, with M ≥ 0 and c ≥ 0. Moreover, note that for any PPS f it holds that f 0 (u) ≥ 0 for all u 0.

An eigenvector of an n×n-matrix M with eigenvalue λ ∈ C is a (complex) vector v 6= 0 satisfying Mv = λv. There are at most n different eigenvalues. The spectral radius ρ(M) ∈ R<sup>≥</sup><sup>0</sup> is the largest absolute value of the eigenvalues of M. The following is a fundamental theorem about non-negative matrices:

#### Theorem 1 (Perron-Frobenius; e.g. [37]). Let M ≥ 0 be irreducible.


The unique eigenvector v 0 with ||v||<sup>∞</sup> = 1 of an irreducible non-negative matrix M is called the Perron-Frobenius eigenvector of M.

Strongly Connected Components. To each PPS f we associate a finite directed graph G<sup>f</sup> = ({x1, . . . , xn}, E), which, intuitively speaking, captures the dependency structure among the variables. Formally, (x<sup>i</sup> , x<sup>j</sup> ) ∈ E if the polynomial f<sup>i</sup> depends on x<sup>j</sup> , i.e., x<sup>j</sup> appears in at least one term of f<sup>i</sup> with a non-zero coefficient. This is equivalent to saying that the partial derivative <sup>∂</sup> ∂x<sup>j</sup> fi is not the zero polynomial. We say that f is strongly connected if G<sup>f</sup> is strongly connected, i.e., for each pair (x<sup>i</sup> , x<sup>j</sup> ) of variables, there exists a path from x<sup>i</sup> to x<sup>j</sup> in G<sup>f</sup> . For instance, fex from Example 1 is strongly connected because the dependency graph has the edges E = {(x, y),(y, x),(y, y)}. Strong connectivity of PPS is a generalization of irreducibility of matrices; indeed, a matrix M is irreducible iff the PPS x = Mx is strongly connected. We often use the fact that f 0 (u) for u 0 is irreducible iff f is strongly connected.

PPS are usually analyzed in a decomposed fashion by considering the subsystems induced by the strongly connected components (SCCs) of G<sup>f</sup> in bottomup order [16]. Here we also follow this approach and therefore focus on strongly connected PPS. The following was proved in [17, Lem. 6.5] and later generalized in [12, Thm. 4.1] (also see remark below [12, Prop. 5.4] and [17, Lem. 8.2]):

Theorem 2 ([17,12]). If f is feasible, strongly connected and clean, then for all u < µf, we have ρ(f 0 (u)) < 1. As a consequence, ρ(f 0 (µf)) ≤ 1.

Theorem 2 partitions all PPS f which satisfy its precondition into two classes: Either (1) ρ(f 0 (µf)) < 1, or (2) ρ(f 0 (µf)) = 1. In the next section we show that f admits non-trivial inductive upper bounds iff it is in class (1).

Example 2. Reconsider the PPS fex from Example 1. It can be shown that v = (1, λ1) <sup>T</sup> where λ<sup>1</sup> ≈ 0.557 is an eigenvector of f 0 ex(µfex) with eigenvalue λ1. Thus by the Perron-Frobenius Theorem, ρ(f 0 ex(µfex)) = λ<sup>1</sup> < 1. As promised, there exist inductive upper bounds as can be seen in Figure 2. 4

### 3 Finding Inductive Upper Bounds in PPS

In this section, we are concerned with the following problem: Given a feasible, clean, and strongly connected PPS f, find a vector 0 ≺ u ≺ ∞ such that f(u) ≤ u, i.e., an inductive upper bound on the lfp of f (see Lemma 1).

#### 3.1 Existence of Inductive Upper Bounds

An important first observation is that inductive upper bounds other than the exact lfp do not necessarily exist. As a simple counter-example consider the 1 dimensional PPS x = 1 2 x <sup>2</sup> + 1 2 . If u is an inductive upper bound, then

$$
\frac{1}{2}u^2 + \frac{1}{2} \le u \implies u^2 - 2u + 1 \le 0 \implies (u-1)^2 \le 0 \implies u = 1\text{ ,}
$$

and thus the only inductive upper bound is the exact lfp u = 1. Another example is the PPS f˜ex from Figure 2. What these examples have in common is the

Fig. 2: The PPS fex corresponds to the solid red line and the solid blue curve. Its inductive upper bounds form the shaded area above the lfp µfex. Lemma 2(4) ensures that one can fit the gray "cone" pointing in direction of the Perron-Frobenius eigenvector v inside the inductive region. The PPS f˜ex which comprises the dashed curve and the solid line does not have any non-trivial inductive upper bounds. Note that the tangent lines at µf˜ex are parallel to each other.

following property: Their derivative evaluated at the lfp is not invertible. Indeed, we have <sup>∂</sup> ∂x ( 1 2 x <sup>2</sup> + 1 <sup>2</sup> − x) = x − 1, and inserting the lfp x = 1 yields zero. The higher dimensional generalization of this property to arbitrary PPS f is that the Jacobi matrix of the function f − x evaluated at µf is singular; note that this is precisely the matrix f 0 (µf) − I. Geometrically, this means that the tangent lines at µf are parallel, as can be seen in Figure 2 for the example PPS f˜ex . It should be intuitively clear from the figure that inductive upper bounds only exist if the tangent lines are not parallel. The next lemma makes this more precise:

Lemma 2 (Existence of inductive upper bounds). Let f be a feasible, clean, and strongly connected PPS. Then the following are equivalent:


$$f(\mu f + \delta \cdot \tilde{v}) \quad \prec \quad \mu f + \delta \cdot \tilde{v}$$

holds for all 0 < δ ≤ δmax and vectors v˜ ≥ v with ||v − v˜||<sup>∞</sup> ≤ ε.

The proof of Lemma 2 (see [44]) relies on a linear approximation of f via Taylor's familiar theorem as well as Theorems 1 and 2. Condition (4) of Lemma 2 means that there exists a "truncated cone"

$$\operatorname{Cone}(\mu \mathbf{f}, \mathfrak{v}, \varepsilon, \delta\_{\max}) \ = \{ \mu \mathbf{f} + \delta \tilde{\mathbf{v}} \, | \, 0 \le \delta \le \delta\_{\max}, \tilde{\mathbf{v}} \ge \mathfrak{v}, ||\tilde{\mathbf{v}} - \mathfrak{v}||\_{\infty} \le \varepsilon \}$$

which is entirely contained in the inductive region. The "tip" of the cone is located at the lfp µf and, the cone points in the direction of the Perron-Frobenius eigenvector v, as illustrated in Figure 2 (assuming δmax = 1 for simplicity). The length δmax > 0 and the radius ε > 0 of the cone depend on ρ(f 0 (µf)), but for us it suffices that they are non-zero. Note that this cone has non-empty interior and thus contains rational-valued vectors. The idea of our Optimistic Value Iteration is to construct a sequence of guesses that eventually hits this cone.

#### 3.2 The Optimistic Value Iteration Algorithm

The basic idea of Optimistic Value Iteration (OVI) can be applied to monotone functions of the form φ: R n <sup>≥</sup><sup>0</sup> <sup>→</sup> <sup>R</sup> n ≥0 (in [22], φ is the Bellman operator of an MDP). Kleene's fixpoint theorem suggests a simple method for approximating the lfp µφ from below: Simply iterate φ starting at 0, i.e., compute the sequence l<sup>0</sup> = 0, l<sup>1</sup> = φ(l0), l<sup>2</sup> = φ(l1), etc.<sup>1</sup> In the context of MDP, this iterative scheme is known as Value Iteration (VI). VI is easy to implement, but it is difficult to decide when to stop the iteration. In particular, standard stopping criteria such as small absolute difference of consecutive approximations are formally unsound [20]. OVI and other algorithms [3,36] cope with this problem by computing not only a lower but also an upper bound on µφ. In the case of OVI, an upper bound with absolute error ≤ ε is obtained as follows (we omit some details):

	- (a) If φ(u) ≤ u holds, i.e., u is inductive, then return u.
	- (b) If not, refine u (see [22] for details). If the refined u is still not inductive, then go back to step (1) and try again with 0 < τ <sup>0</sup> < τ .

We present our variant of OVI for PPS as Algorithm 1. The main differences to the above scheme are that (i) we do not insist on Kleene iteration for obtaining the lower bounds l, and (ii) we approximate the eigenvector v from condition (4) of Lemma 2 and compute the "more informed" guesses u = l + εv, for various ε. Refining the guesses as original OVI does is not necessary (but see our remarks in Section 3.3 regarding floating point computations).

The functions improveLowerBound and approxEigenvec used in Algorithm 1 must satisfy the following contracts in order for the algorithm to be correct:


<sup>1</sup> In order for the Kleene seqence to converge to the lfp, i.e., limk→∞ l<sup>k</sup> = µφ, it suffices that φ is ω-continuous. This already implies monotonicity.

Algorithm 1: Optimistic Value Iteration (OVI) for PPS

input : strongly connected clean PPS f; maximum abs. error ε ∈ Q><sup>0</sup> output : a pair (l, u) of rational vectors s.t. l ≤ µf, f(u) ≤ u (hence µf ≤ u), and ||l − u||<sup>∞</sup> ≤ ε termination : guaranteed if f is feasible and I − f 0 (µf) is non-singular 1 l ← 0 ; N ← 0 ; 2 τ ← ε ; /\* τ is the current tolerance \*/ 3 while true do 4 l <sup>0</sup> ← improveLowerBound(f,l) ; /\* e.g. Kleene or Newton update \*/ /\* guess and verify phase starts here \*/ 5 if ||l − l 0 ||<sup>∞</sup> ≤ τ then 6 v ← approxEigenvec(f 0 (l), τ ) ; /\* recall v is normalized \*/ 7 for k from 0 to N do 8 u ← l + d k ε · v ; /\* optimistic guess, d ∈ (0, 1) \*/ 9 if f(u) ≤ u then 10 return (l, u) ; /\* guess was successful \*/ 11 N ← N + 1 ; 12 τ ← c · τ ; /\* decrease tolerance for next guess, c ∈ (0, 1) \*/ 13 l ← l 0 ;

In practice, both the Kleene and the Newton [16,17,12] update operator can be used to implement improveLowerBound. We outline a possible implementation of approxEigenvec further below in Section 3.3.

Example 3. Consider the following PPS f: x = 1 4 x <sup>2</sup> + 1 8 , y = 1 4 xy + 1 4 y + 1 4 . The table illustrates the execution of Algorithm 1 on f with ε = 0.1 and c = 0.5:


The algorithm has to improve the lower bound 3 times (corresponding to the 3 lines of the table). After the second improvement, the difference between the current lower bound l<sup>2</sup> and the new bound l 0 <sup>2</sup> does not exceed the current tolerance τ<sup>2</sup> = 0.1 and the algorithm enters the optimistic guessing stage. The first guess u<sup>2</sup> is not successful. The tolerance is then decreased to τ<sup>3</sup> = c·τ<sup>2</sup> = 0.05 and the lower bound is improved to l 0 <sup>3</sup>. The next guess u<sup>3</sup> is inductive. 4

Theorem 3. Algorithm 1 is correct: when invoked with a strongly connected clean PPS f and ε ∈ Q>0, then (if it terminates) it outputs a pair (l,u) of rational vectors s.t. l ≤ µf, f(u) ≤ u, and ||l − u||<sup>∞</sup> ≤ ε. Moreover, if f is feasible and I − f 0 (µf) is non-singular, then the algorithm terminates.

The proof of Theorem 3 (see [44]) crucially relies on condition (4) of Lemma 2 that assures the existence of a "truncated cone" of inductive bounds centered around the Perron-Frobenius eigenvector of f 0 (µf) (see Figure 2 for an illustration). Intuitively, since the lower bounds l computed by the algorithm approach the lfp µf, the eigenvectors of f 0 (l) approach those of f 0 (µf). As a consequence, it is guaranteed that the algorithm eventually finds an eigenvector that intersects the cone. The inner loop starting on line 7 is needed because the "length" of the cone is a priori unknown; the purpose of the loop is to scale the eigenvector down so that it is ultimately small enough to fit inside the cone.

### 3.3 Considerations for Implementing OVI

As said earlier, there are at least two options for improveLowerBound: Kleene or Newton iteration. We now show that approxEigenvec can be effectively implemented as well. Further below we comment on floating point arithmetic.

Approximating the Eigenvector. A possible implementation of approxEigenvec relies on the power iteration method (e.g. [38, Thm. 4.1]). Given a square matrix M and an initial vector v<sup>0</sup> with Mv<sup>0</sup> 6= 0, power iteration computes the sequence (vi)i≥<sup>0</sup> such that for i > 0, v<sup>i</sup> = Mvi−1/||Mvi−1||∞.

Lemma 3. Let M ≥ 0 be irreducible. Then power iteration applied to M + I and any v<sup>0</sup> > 0 converges to the Perron-Frobenius eigenvector v 0 of M.

The convergence rate of power iteration is determined by the ratio |λ2|/|λ1| where λ<sup>1</sup> and λ<sup>2</sup> are eigenvalues of largest and second largest absolute value, respectively. Each time approxEigenvec is called in Algorithm 1, the result of the previous call to approxEigenvec may be used as initial approximation v0.

Exact vs Floating Point Arithmetic. So far we have assumed exact arithmetic for the computations in Algorithm 1, but an actual implementation should use floating point arithmetic for efficiency. However, this leads to unsound results. More specifically, the condition f(u) ≤ u may hold in floating point arithmetic even though it is actually violated. As a remedy, we propose to nevertheless run the algorithm with floats, but then verify its output u with exact arbitrary-precision rational arithmetic. That is, we compute a rational number approximation u<sup>Q</sup> of u and check f(uQ) ≤ u<sup>Q</sup> with exact arithmetic. If the check fails, we resort to the following refinement scheme which is an instance of the general k-induction principle for complete lattices from [5]: We iteratively check the conditions

$$f(u\_{\mathbb{Q}} \sqcap f(u\_{\mathbb{Q}})) \le u\_{\mathbb{Q}}, \quad f(u\_{\mathbb{Q}} \sqcap f(u\_{\mathbb{Q}} \sqcap f(u\_{\mathbb{Q}}))) \le u\_{\mathbb{Q}}, \quad \text{and so on,}$$

where u denotes pointwise minimum. If one of the checks is satisfied, then µf ≤ u<sup>Q</sup> [5]. This scheme often works well in practice (see Section 5). The original OVI from [22] uses a similar technique to refine its guesses.

#### 4 Certificates for Probabilistic Pushdown Automata

This section shows how the results from Section 3 can be applied to pPDA. We introduce some additional notation. For finite sets A, D(A) denotes the set of probability distributions on A. In this section we often denote tuples without parentheses and commata, e.g., we may write ab rather than (a, b).

Definition 1 (pPDA [13]). A probabilistic pushdown automaton (pPDA) is a triple ∆ = (Q, Γ, P) where Q 6= ∅ is a finite set of states, Γ 6= ∅ is a finite stack alphabet, and P : Q × Γ → D(Q × Γ ≤2 ) is a probabilistic transition function.

In the following, we often write qZ <sup>p</sup> −→ rα instead of P(qZ)(rα) = p [13]. Intuitively, qZ <sup>p</sup> −→ rα means that if the pPDA is in state q and Z is on top of the stack, then with probability p, the pPDA moves to state r, pops Z and pushes α on the stack. More formally, the semantics of a pPDA ∆ = (Q, Γ, P) is a countably infinite Markov chain with state space Q × Γ <sup>∗</sup> and transition probability matrix M such that for all q, r ∈ Q, Z ∈ Γ, α ∈ Γ ≤2 , γ ∈ Γ ∗ , we have

$$M(qZ\gamma, r\alpha\gamma) = P(qZ)(r\alpha)\,, \qquad M(q\varepsilon, q\varepsilon) = 1\,,$$

and all other transition probabilities are zero. This Markov chain, where the initial state is fixed to qZ, is denoted MqZ <sup>∆</sup> (see Figure 3 for an example). As usual, one can formally define a probability measure P qZ <sup>∆</sup> on the infinite runs of MqZ <sup>∆</sup> via the standard cylinder construction (e.g., [2, Sec. 10]).

Consider a triple qZr ∈ Q×Γ×Q. We define the return probability<sup>2</sup> [qZr] as the probability of reaching rε in the Markov chain MqZ <sup>∆</sup> , i.e., [qZr] = <sup>P</sup> qZ <sup>∆</sup> (♦{rε}), where ♦{rε} is the set of infinite runs of MqZ <sup>∆</sup> that eventually hit state rε.

Theorem 4 (The PPS of return probabilities [13] 3 ). Let ∆ = (Q, Γ, P) be a pPDA and (hqZri)qZr <sup>∈</sup> <sup>Q</sup>×<sup>Γ</sup> <sup>×</sup><sup>Q</sup> be variables. For each hqZri, define

$$\langle qZr \rangle \quad = \sum\_{\substack{q \ge \frac{p}{\ddots}s \, YX}} p \cdot \sum\_{t \in Q} \langle sYt \rangle \cdot \langle tXr \rangle \ + \sum\_{\substack{q \ge \frac{p}{\ddots}s \, Y}} p \cdot \langle sYr \rangle \ + \sum\_{\substack{q \ge \frac{p}{\cdots}rs}} p$$

and call the resulting PPS f∆. Then µf<sup>∆</sup> = ([qZr])qZr <sup>∈</sup> <sup>Q</sup>×<sup>Γ</sup> <sup>×</sup>Q.

Example 4. Figure 3 shows a pPDA ∆ex and the associated PPS f<sup>∆</sup>ex . The least non-negative solution is hqZqi = 2 − √ 2 ≈ 0.586 and hqZri = √ 2 − 1 ≈ 0.414 (and, of course, hrZqi = 0, hrZri = 1). Thus by Theorem 4, the return probabilities are [qZq] = 2 − √ <sup>2</sup> and [qZr] = <sup>√</sup> 2 − 1. 4

The PPS f<sup>∆</sup> is always feasible (because µf<sup>∆</sup> ≤ 1). f<sup>∆</sup> is neither necessarily strongly connected nor clean. Let fˆ<sup>∆</sup> denote the cleaned up version of f∆.

<sup>2</sup> See [42] for an explanation of this terminology.

<sup>3</sup> We refer to [30, Sec. 3] for an intuitive explanation of the equations in f∆.

Fig. 3: Top left: The pPDA ∆ex = ({q, r}, {Z}, P) where P comprises the transitions qZ <sup>1</sup>/<sup>4</sup> −−→ qZZ, qZ <sup>1</sup>/<sup>2</sup> −−→ qε, qZ <sup>1</sup>/<sup>4</sup> −−→ rε, rZ <sup>1</sup> −→ rε. Top right: A fragment of the infinite underlying Markov chain MqZ <sup>∆</sup> , assuming initial configuration qZ. Bottom: The associated equation system from Theorem 4.

Proposition 1 (Basic Certificates for pPDA). A basic certificate for ∆ = (Q, Γ, P) is a rational inductive upper bound u ∈ Q Q×Γ ×Q ≥0 on the lfp of the return probabilities system f<sup>∆</sup> (see Thm. 4). They have the following properties:


Existence of basic certificates follows from Lemma 2 applied to each SCC of the cleaned-up version of f<sup>∆</sup> individually. However, note that in order to merely check the certificate, i.e., verify the inequality f(u) ≤ u, neither do SCCs need to be computed nor does the system has to be cleaned up.

Example 5. Reconsider the example pPDA and its associated (non-strongly connected) system of return probabilities from Figure 3. We verify that uqZq = 3/5 and uqZr = 1/2 (as well as urZq = 0,urZr = 1) is a basic certificate:

$$\frac{1}{4}\left(\frac{3}{5}\cdot\frac{3}{5}+\frac{1}{2}\cdot 0\right)+\frac{1}{2} = \frac{59}{100}\stackrel{\checkmark}{\leq}\frac{3}{5}\quad,\quad\frac{1}{4}\left(\frac{3}{5}\cdot\frac{1}{2}+\frac{1}{2}\cdot 1\right)+\frac{1}{4} = \frac{45}{100}\stackrel{\checkmark}{\leq}\frac{1}{2}\dots\frac{1}{2}$$

Note that [qZq] ≈ 0.586 ≤ 3/5 = 0.6 and [qZr] ≈ 0.414 ≤ 1/2 = 0.5. 4

In the following we outline how a variety of key quantities associated with a pPDA can be verified using basic certificates.

Upper Bounds on Temporal Properties. We may use basic certificates to verify that a bad state rbad is reached with low probability, e.g., at most p = 0.01. To this end, we remove the outgoing transitions of rbad and add the transitions rbadZ 1 −→ rbadε for all Z ∈ Γ. Clearly, rbad is reached with probability at most p from initial configuration qZ iff [qZrbad] ≤ p. The results of [13] imply that this idea can be generalized to until-properties of the form C<sup>1</sup> U C2, where C<sup>1</sup> and C<sup>2</sup> are regular sets of configurations.

Certificates for the Output Distribution. Once a pPDA reaches the empty stack, we say that it has terminated. When modeling procedural programs, this corresponds to returning from a program's main procedure. Assuming initial configuration qZ, the probability sub-distribution over the possible return values is then given by the return probabilities {[qZr] | r ∈ Q}. Missing probability mass models the probability of non-termination. Therefore, a basic certificate may be used to prove a point-wise upper bound on the output distribution as well as non almost-sure termination (AST). If a pPDA ∆ is known to be AST, then we can also certify a lower bound on the output distribution: Suppose that u is a basic certificate for ∆ and assume that ∆ is AST from initial configuration qZ. Define ε = P <sup>r</sup>∈<sup>Q</sup> uqZr − 1. Then for all r ∈ Q, we have uqZr − ε ≤ [qZr] ≤ uqZr.

Example 6. The pPDA ∆ex from Figure 3 is AST from initial configuration qZ, as the transition qZ <sup>1</sup>/<sup>4</sup> −−→ rε is eventually taken with probability 1, and the stack is emptied certainly once r is reached. Using the basic certificate from Example 5 we can thus (correctly) certify that 0.5 ≤ [qZq] ≤ 0.6 and 0.4 ≤ [qZr] ≤ 0.5.

Certificates for Expected Rewards. pPDA may also be equipped with a reward function Q → R<sup>≥</sup>0. It was shown in [14] that the expected reward accumulated during the run of a pPDA is the solution of a linear equation system whose coefficients depends on the numbers [qZr]. Given a basic certificate u, we obtain an equation system whose solution is an over-approximation of the true expected reward (see [44]). We may extend the basic certificate u by the solution of this linear system to make verification straightforward. Note that a program's expected runtime [8,35] is a special case of total expected reward.

#### 5 Implementation and Experiments

Our Tool: pray. We implemented our algorithm in the prototypical Java-tool pray (Probabilistic Recursion AnalYzer) [43]. It supports two input formats: (i) Recursive probabilistic programs in a Java-like syntax (e.g. Figure 4); these programs are automatically translated to pPDA. (ii) Explicit PPS in the same syntax used by the tool PReMo [46]. The output of pray is a rational inductive upper bound on the lfp of the return probability PPS of the input program's pPDA model (a basic certificate), or on the lfp of the explicitly given PPS. The absolute precision ε is configurable. The implementation works as follows:


Baselines. To the best of our knowledge, no alternative techniques for finding inductive upper bounds in PPS have been described explicitly in the literature. However, there is an (almost) out-of-the-box approach using an SMT solver: Given a PPS x = f(x), compute some lower bound l ≤ µf using an iterative technique. Then query the SMT solver for a model (variable assignment) of the quantifier-free first-order logic formula ϕ<sup>f</sup> (x) = V<sup>n</sup> <sup>i</sup>=1 fi(x) ≤ xi∧l<sup>i</sup> ≤ x<sup>i</sup> ≤ li+ε in the (decidable) theory of polynomial real arithmetic with inequality (aka QF\_NRA in the SMT community). If such a model u exists, then clearly µf ≤ u and ||l − u||<sup>∞</sup> ≤ ε. If no model exists, then improve l and try again. We have implemented this approach using the state-of-the-art SMT solvers cvc5 [4] and z3 [34], the winners of the 2022 SMT-COMP in the category QF\_NRA<sup>5</sup> .

As yet another baseline, we have also implemented a variant of OVI for PPS which is closer to the original MDP algorithm from [22]. In this variant, called "standard OVI" from now on, we compute the candidate u based on the relative update rule u = (1 + ε)l, where l is the current lower bound [22].

Research Questions. We aim to shed some light on the following questions: (A) How well does our algorithm scale? (B) Is the algorithm suitable for PPS with different characteristics, e.g., dense or sparse? (C) Is the requirement ρ(f(µf) 0 ) < 1 restrictive in practice? (D) How does our OVI compare to the baselines?

Benchmarks. To answer the above questions we run our implementation on two sets of benchmarks (Table 1 and Table 2, respectively). The first set consists of various example programs from the literature as well as a few new programs, which are automatically translated to pPDA. This translation is standard and usually takes not more than a few seconds. The programs golden, and-or (see Figure 4), virus, gen-fun are adapted from [35,8,42] and [32, Program 5.6], respectively. The source code of all considered programs is in [44]. We have selected only programs with possibly unbounded recursion depth which induce infinite Markov chains. The second benchmark set comprises explicit PPS from [46]. The instances brown, lemonde, negra, swbd, tiger, tuebadz, and wsj all encode SCFG

<sup>4</sup> In fact, we use the slightly optimized Gauss-Seidel iteration (see [45, Sec. 5.2]) which provides a good trade-off between ease of implementation and efficiency [45].

<sup>5</sup> https://smt-comp.github.io/2022/results

```
bool and() {
  prob {
    1//2: return
      (1//2: true | 1//2: false);
    1//2: {
      if(!or()) return false;
      else return or(); } } }
                                      bool or() {
                                        prob {
                                          1//2: return
                                            (1//2: true | 1//2: false);
                                          1//2: {
                                            if(and()) return true;
                                            else return and(); } } }
```
Fig. 4: Program evaluating a random and-or tree [8]. The prob-blocks execute the contained statements with the respective probabilities (syntax inspired by Java's switch). Our tool automatically translates this program to a pPDA and computes a basic certificate (Proposition 1) witnessing that calling and() returns true and false with probability ≤ 382/657 ≈ 0.58 and 391/933 ≈ 0.42, resp.

from the area of language processing (see [46] for details). random is the return probability system of a randomly generated pPDA.

Summary of Results. We ran the experiments on a standard notebook. The approach based on cvc5 turns out to be not competitive (see [44]). We thus focus on z3 in the following. Both pray and the z3 approach handle most of the programs from Table 1 within a 10 minute time limit. The considered programs induce sparse PPS with 38 - 26,367 variables, and most of them have just a single SCC. Notably, the examples with greatest maximum SCC size are only solved by z3. pray and z3 need at most 95 and 31 seconds, respectively, for the instances where they succeed. In many cases (e.g., rw-5.01, golden, virus, brown, swbd), the resulting certificates formally disprove AST. For the explicit PPS in Table 2, pray solves all instances whereas z3 only solves 3/8 within the time limit, and only finds the trivial solution 1. Most of these benchmarks contain dense high-degree polynomials, and our tool spends most time on performing exact arithmetic. Standard OVI (rightmost columns in Tables 1 and 2) solves strictly less instances than our eigenvector-based OVI. On some instances, Standard OVI is slightly faster (if it succeeds). However, on some larger benchmarks (brown, swbd) our variant runs ≈ 3× faster.

Evaluation of Research Questions. (A) Scalability: Our algorithm succeeds on instances with maximum SCC size of up to 8,000 and number of terms over 50,000. pray solves all instances with a maximum SCC size of ≤ 1,000 in less than 2 minutes per instance. For the examples where our algorithm does not succeed (e.g., escape100) it is mostly because it fails converting a floating point to a rational certificate. (B) PPS with different flavors: The problems in Table 1 (low degree and sparse, i.e., few terms per polynomials) and Table 2 (higher degree and dense) are quite different. A comparison to the SMT approach suggests that our technique might be especially well suited for dense problems with higher degrees. (C) Non-singularity: The only instance where our algorithm fails because of the non-singularity condition is the symmetric random walk rw-0.500. We therefore conjecture that this condition is often satisfied in practice. (D) Comparison with baselines: There is no clear winner. Some instances can only

Table 1: Experiments with PPS obtained from recursive probabilistic programs. Columns vars and terms display the number of variables and terms in the PPS. Columns sccs and sccmax indicate the number of non-trivial SCC and the size of the largest SCC. G is total number of guesses made by OVI (at least one guess per SCC). ttot is the total runtime excluding the time for model construction. t<sup>Q</sup> is the percentage of ttot spent on exact rational arithmetic. D is the average number of decimal digits of the rational numbers in the certificate. The timeout (TO) was set to 10 minutes. Time is in ms. The absolute precision is ε = 10−<sup>3</sup> .


Table 2: Experiments with explicitly given PPS (setup as in Table 1).


be solved by one tool or the other (e.g., escape100 and brown). However, pray often delivers more succinct certificates, i.e., the rational numbers have less digits. Moreover, z3 behaves much less predictably than pray.

### 6 Conclusion and Future Work

We have proposed using inductive bounds as certificates for various properties in probabilistic recursive models, and presented the first dedicated algorithm for computing such bounds. Our algorithm already scales to non-trivial problems. A remaining bottleneck is the need for exact rational arithmetic. This might be improved using appropriate rounding modes as in [21]. Additional future work includes certificates for lower bounds and termination.

Data availability statement The datasets generated during and/or analysed during the current study are available in the Zenodo repository [43].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Probabilistic Program Verification via Inductive Synthesis of Inductive Invariants?

Kevin Batz1(B) , Mingshuai Chen2(B) , Sebastian Junges3(B) , Benjamin Lucien Kaminski4(B) , Joost-Pieter Katoen1(B) , and Christoph Matheja5(B)

> <sup>1</sup> RWTH Aachen University, Aachen, Germany {kevin.batz,katoen}@cs.rwth-aachen.de <sup>2</sup> Zhejiang University, Hangzhou, China m.chen@zju.edu.cn <sup>3</sup> Radboud University, Nijmegen, Netherlands sebastian.junges@ru.nl <sup>4</sup> Saarland University, Saarbr¨ucken, Germany and University College London, London, United Kingdom kaminski@cs.uni-saarland.de <sup>5</sup> Technical University of Denmark, Kgs. Lyngby, Denmark chmat@dtu.dk

Abstract. Essential tasks for the verification of probabilistic programs include bounding expected outcomes and proving termination in finite expected runtime. We contribute a simple yet effective inductive synthesis approach for proving such quantitative reachability properties by generating inductive invariants on source-code level. Our implementation shows promise: It finds invariants for (in)finite-state programs, can beat stateof-the-art probabilistic model checkers, and is competitive with modern tools dedicated to invariant synthesis and expected runtime reasoning.

### 1 Introduction

Reasoning about reachability probabilities is a foundational task in the analysis of randomized systems. Such systems are (possibly infinite-state) Markov chains, which are typically described as probabilistic programs – imperative programs that may sample from probability distributions. We contribute a method for proving bounds on quantitative properties of probabilistic programs, which finds inductive invariants on source-code level by inductive synthesis. We discuss each of these ingredients below, present our approach with a running example in Sect. 2, and defer a detailed discussion of related work to Sect. 8.

1) Quantitative Reachability Properties. We aim to verify properties such as "is the probability of reaching an error at most 1%?" More generally, our technique proves bounds on the expected value of a probabilistic program terminating in designated states (see Sect. 2.1). Various verification problems are ultimately

© The Author(s) 2023

<sup>?</sup> This research was funded by the ERC AdG FRAPPANT under grant No. 787914.

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 410–429, 2023. https://doi.org/10.1007/978-3-031-30820-8 25

Fig. 1: Our CEGIS framework for synthesizing quantitative inductive invariants.

solved by bounding quantitative reachability properties (cf. [7,47]). Further examples of such problems include "does a program terminate with finite expected runtime?" and "is the expected sum of program variables x and y at least one?"

2) Inductive Invariants. An inductive invariant is a certificate that witnesses a certain quantitative reachability property. Quantitative (and qualitative) reachability are typically captured as least fixed points (cf. [52,47,7]). For upper bounds, this characterization makes it natural to search for a prefixed point – the inductive invariant – that, by standard fixed point theory [56], is greater than or equal to the least fixed point. Our invariants assign every state a quantity. If the initial state is assigned a quantity below the desired threshold, then the invariant certifies that the property in question holds. We detail quantitative inductive invariants in Sect. 2.2; we adapt our method to lower bound reasoning in Sect. 6.

3) Source-Code Level. We consider probabilistic programs over (potentially unbounded) integer variables that conceptually extend while-programs with coin flips, see e.g. Fig. 2. <sup>6</sup> We exploit the program structure to reason about infinitestate (and large finite-state) programs: We never construct a Markov chain but find symbolic inductive invariants (mapping from program states to nonnegative reals) on source-code level. We particularly discover inductive invariants that are piecewise linear, as they can often be verified efficiently.

4) Inductive Synthesis. Our approach to finding invariants, as sketched in Fig. 1, is inspired by inductive synthesis [4]: The inner loop (shaded box) is provided with a template T which may generate an infinite set hTi of instances. We then synthesize a template instance I that is an inductive invariant witnessing quantitative reachability, or determine that no such instance exists. We search for such instances in a counterexample-guided inductive synthesis (CEGIS) loop: The synthesizer constructs a candidate. (A tailored variant of) an off-the-shelf verifier either (i) decides that the candidate is a suitable inductive invariant or (ii) reports a counterexample state s back to the synthesizer. Upon termination (guaranteed for finite-state programs), the inner loop has either found an inductive invariant or the solver reports that the template T does not admit an inductive invariant.

Contributions. We show that inductive synthesis for verifying quantitative reachability properties by finding inductive invariants on source-code level is

<sup>6</sup> Prism programs can be interpreted as an implicit while(not error-state) {. . .} program – see [40] for an explicit translation.

1: fail := 0 ; sent := 0 ; 2: while ( sent < 8 000 000 ∧ fail < 10 ) { 3: { fail := 0 ; sent := sent + 1 } [ 0.999 ] { fail := fail + 1 } }

Fig. 2: Model for the bounded retransmission protocol (BRP).

feasible: Our approach is sound for arbitrary probabilistic programs, and complete for finite-state programs. We implemented our simple yet powerful technique. The results are promising: Our CEGIS loop is sufficiently fast to support large templates and finds inductive invariants for various probabilistic programs and properties. It can prove, amongst others, upper and lower bounds on reachability probabilities and universal positive almost-termination [42]. Our implementation is competitive with three state-of-the-art tools – Storm [39], Absynth [50], and Exist [9] – on subsets of their benchmarks fitting our framework.

Applicability and Limitations. We consider programs with possibly unbounded nonnegative integer-valued variables and arbitrary affine expressions in quantitative specifications. As for other synthesis-based approaches, there are unrealizable cases – loops for which no piecewise linear invariant exists. But, if there is an invariant, our CEGIS loop often finds it within a few iterations.

### 2 Overview

We illustrate our approach using the bounded retransmission protocol (BRP) – a standard probabilistic model checking benchmark [38,28] – modeled by the probabilistic program in Fig. 2. The model attempts to transmit 8 million packets<sup>7</sup> over a lossy channel, where each packet is lost with probability 0.1%; if a packet is lost, we retry sending it; if any packet is lost in 10 consecutive sending attempts (fail = 10), the entire transmission fails; if all packets have been transmitted successfully (sent = 8 000 000), the transmission succeeds.

### 2.1 Reachability Probabilities and Loops

We aim to reason about the transmission-failure probability of BRP, i.e. the probability that the loop terminates in a target state t with t(fail) = 10 when started in initial program state s<sup>0</sup> with s0(fail) = s0(sent) = 0. One approach to determine this probability is to (i) construct an explicit-state Markov chain (MC) per Fig. 2, (ii) derive its Bellmann operator Φ [52], (iii) compute its least fixed point lfp Φ (a vector containing for each state the probability to reach t), e.g. using value iteration (cf. [7, Thm 10.15]), and finally (iv) evaluate lfp Φ at s0.

The explicit-state MC of BRP has ca. 80 million states. We avoid building such large state spaces by computing a symbolic representation of Φ from the

<sup>7</sup> Large constants like the number of packets appear naturally in quantitative models of protocols and have a non-trivial impact on probabilities.

program. More formally, let S be the set of all states, loop the entire loop (ll. 2–3 in Fig. 2), body the loop's body (l. 3), and <sup>J</sup>bodyK(s)(<sup>s</sup> 0 ) the probability of reaching state s <sup>0</sup> by executing body once on state s. Then the least fixed point of the loop's Bellmann operator Φ: S → R<sup>∞</sup> ≥0 → S → R<sup>∞</sup> ≥0 , defined by

$$\Phi(I) = \lambda s. \begin{cases} 1, & \text{if } s(fail) = 10 \\ \sum\_{s' \in S} \mathsf{[body]}(s)(s') \cdot I(s'), & \text{if } s(fail) < 8 \,000 \,000 \\ 0, & \text{and } s(fail) < 10 \,000 \end{cases}$$

captures the transmission-failure probability for the entire execution of loop and for any initial state, that is, (lfp Φ)(s) is the probability of terminating in a target state when executing loop on s (even if loop would not terminate almost-surely). Intuitively, Φ(I)(s) maps to 1 if loop has terminated meeting the target condition (transmission failure); and to 0 if loop has terminated otherwise (transmission success). If loop is still running (i.e. it has neither failed nor succeeded yet), then Φ(I)(s) maps to the expected value of I after executing body on state s.

#### 2.2 Quantitative Inductive Invariants

Reachability probabilities are generally not computable for infinite-state probabilistic programs [43]. Even for finite-state programs the state-space explosion may prevent us from computing reachability probabilities exactly. However, it often suffices to know that the reachability probability is bounded from above by some threshold λ. For BRP, we hence aim to prove that (lfp Φ)(s0) ≤ λ.

We attack the above task by means of (quantitative) inductive invariants: a candidate for an inductive invariant is a mapping I : S → R<sup>∞</sup> ≥0 . Intuitively, such a candidate I is inductive if the following holds: when assuming that I(s) is (an over-approximation of) the probability to reach a target state upon termination of loop on s, then the probability to reach a target state after performing one more guarded loop iteration, i.e. executing if ( sent < . . .) { body ; loop } on s, must be at most I(s). Formally, I is an inductive invariant<sup>8</sup> if

∀s: Φ(I)(s) ≤ I(s) which implies ∀s: lfp Φ (s) ≤ I(s)

by Park induction [51]. Hence, I(s) bounds for each initial state s the exact reachability probability from above. If we are able to find an inductive I that is below λ for the initial state s<sup>0</sup> with fail = sent = 0, i.e. I(s0) ≤ λ, then we have indeed proven the upper bound λ on the transmission-failure probability of our BRP model. In a nutshell, our goal can be phrased as follows:

Goal: Find an inductive invariant I, i.e. an I with Φ(I) ≤ I, s.t. I(s0) ≤ λ.

<sup>8</sup> For an exposition of why it makes sense to speak of invariants even in a quantitative setting, [42, Sect. 5.1] relates quantitative invariants to invariants in Hoare logic.

#### 2.3 Our CEGIS Framework for Synthesizing Inductive Invariants

While finding a safe inductive invariant I is challenging, checking whether a given candidate I is indeed inductive is easier: it is decidable for certain infinite-state programs (cf. [14, Sect. 7.2]), it may not require an explicit exploration of the whole state space, and it can be done efficiently for piecewise linear I. Hence, techniques that generate decent candidate expressions fast and then check their inductivity could enable the automatic verification of probabilistic programs with gigantic and even infinite state spaces.

In this paper, we test this hypothesis by developing the CEGIS framework depicted in Fig. 1 for incrementally synthesizing inductive invariants. A template generator generates parametrized templates for inductive invariants. The inner loop (shaded box in Fig. 1) then tries to solve for appropriate template-parameter instantiations. If it succeeds, an inductive invariant has been synthesized. Otherwise, the template provably cannot be instantiated into an inductive invariant. The inner loop then reports that back to the template generator (possibly with some hint on why it failed, see [12, Appx. D]) and asks for a refined template.

For our running example, we start with the template

$$T = \begin{bmatrix} [fail < 10 \land sent < 8000 \, 000] \cdot (\alpha \cdot sent + \beta \cdot fail + \gamma) & + \ [fail = 10] \end{bmatrix}, \tag{1}$$

where we use Iverson brackets for indicators, i.e. [ϕ](s) = 1 if s |= ϕ and 0 otherwise. T contains two kinds of variables: integer program variables fail, sent and Q-valued parameters α, β, γ. While the template is nonlinear, substituting α, β, γ with concrete values yields piecewise linear candidate invariants I. We ensure that those I are piecewise linear to render the repeated inductivity checks efficient. We construct only so-called natural templates T with Φ in mind, e.g. we want to construct only I such that I(s) = 1 when s(fail) = 10.

Our inner CEGIS loop checks whether there exists an assignment from these template variables to concrete values such that the resulting piecewise linear expression is an inductive invariant. Concretely, we try to determine whether there exist values for α, β, γ such that T(α, β, γ) is inductive. For that, we first guess values for α, β, γ, say all 0's, and ask a verifier whether the instantiated (and now piecewise linear) template I = T(0, 0, 0) is indeed inductive. In our example, the verifier determines that I is not inductive: a counterexample is s(fail) = 9, s(sent) = 7999999. Intuitively, the probability to reach the target after one more loop iteration exceeds the value in I for this state, that is, Φ(I)(s) = 0.001 > 0 = I(s). From this counterexample, our synthesizer learns

$$\Phi(T)(s) \;=\; 0.001 \stackrel{!}{\leq} \alpha \cdot 7999999 + \beta \cdot 9 + \gamma \;=\; T(s)\;.$$

Observe that this learned lemma is linear in α, β, γ. The synthesizer will now keep "guessing" assignments to the parameters which are consistent with the learned lemmas until either no such parameter assignment exists anymore, or until it produces an inductive invariant I = T(. . .). In our running example, assuming λ = 0.9, after 6 lemmas, our synthesizer finds the inductive invariant I

$$\left[fail{<10\land sent<8\land}\right] \cdot \left(-\frac{9}{8\cdot 10^7} \cdot sent + \frac{79.991}{72\cdot 10^7} \cdot fail + \frac{9}{10}\right) + \left[fail{=10}\right] \tag{2}$$

Fig. 3: A bounded retransmission protocol family and piece of a matching invariant.

where indeed I(s0) ≤ λ holds. For a tighter threshold λ, such simple templates do not suffice. For example, it is impossible to instantiate this template to an inductive invariant for λ = 0.8, even though 0.8 is an upper bound on the actual reachability probability. We therefore support more general templates of the form

$$T = \sum\_{i} [B\_i] \cdot (\alpha\_i \cdot sent + \beta\_i \cdot fail + \gamma\_i) \quad + \quad [fail = 10] \; ,$$

where the B<sup>i</sup> are (restricted) predicates over program and template variables which partition the state space. In particular, we allow for a template such as

$$\begin{array}{rcl} T & = & [fail < 10 \land sent < \delta] \cdot (\alpha\_1 \cdot sent + \beta\_1 \cdot fail + \gamma\_1) \ & &\\ & [fail < 10 \land sent \ge \delta] \cdot (\alpha\_2 \cdot sent + \beta\_2 \cdot fail + \gamma\_2) & & \end{array} \tag{3}$$

However, such templates are challenging for the CEGIS loop. Thus, we additionally consider templates where the B<sup>i</sup> 's range only over program variables, e.g.

$$[[fail < 10 \land sent < 4000 \, 000] \cdot (\dots)] \, + \, [fail < 10 \land sent \ge 4000 \, 000] \cdot (\dots) \, + \, \dots \, \dots$$

Our partition refinement algorithms automatically produce these templates, without the need for user interaction.

Finally, we highlight that we may use our approach for more general questions. For BRP, suppose we want to verify an upper bound λ = 0.05 on the probability of failing to transmit all packages for an infinite set of models (also called a family) with varying upper bounds on packets 1 ≤ P ≤ 8000000 and retransmissions R ≥ 5. This infinite set of models is described by the loop shown in Fig. 3a. Our approach fully automatically synthesizes the following inductive invariant I:

$$\begin{bmatrix} fail < R \wedge sent < P \wedge & P < 8000 \, 000 \wedge & R \ge 5\\ \wedge & R > 1 + fail \wedge \; \frac{13067990199}{5280132671650} \cdot fail \le \frac{5278689867}{211205306866000} \end{bmatrix} \cdot \begin{pmatrix} \frac{-19}{3820000040} \cdot sent \\ + \frac{19}{3820000040} \cdot P \\ + \frac{19500001}{1910000020} \end{pmatrix}$$
  $+ \quad \dots \text{ (7 additional summands omitted)$ 

The first summand of I is plotted in Fig. 3b. Since I overapproximates the probability of failing to transmit all packages for every state, I may be used to infer additional information about the reachability probabilities.

### 3 Formal Problem Statement

Before we state the precise invariant synthesis problem that we aim to solve, we summarize the essential concepts underlying our formalization.

Probabilistic Loops. We consider single probabilistic loops while ( ϕ ) { C } whose loop guard ϕ and (loop-free) body C adhere to the grammar

$$\begin{array}{ccccc} C & \longrightarrow & \textbf{skip} \mid x := e \mid C; C \mid \, \{C\} \, [p] \, \{C\} \mid \, \mathtt{if} \, (\varphi) \, \{C\} \, \mathtt{else} \, \{C\} \\\varphi & \longrightarrow & e < e \mid \, \neg \varphi \mid \, \varphi \wedge \varphi \qquad \quad e \quad \longrightarrow \, z \mid x \mid z \cdot e \mid e + e \,, \end{array}$$

where z ∈ Z is a constant and x is from an arbitrary finite set Vars of N-valued program variables. Program states in S = { s | s: Vars → N } map variables to natural numbers.<sup>9</sup> All statements are standard (cf. [47]). { C<sup>1</sup> } [ p ] { C<sup>2</sup> } is a probabilistic choice which executes C<sup>1</sup> with probability p ∈ [0, 1] ∩ Q and C<sup>2</sup> with probability 1 − p. Fig. 2 (ll. 2–3) is an example of a probabilistic loop.

Expectations. In Sect. 2, we considered whether final states meet some target condition by assigning 0 or 1 to each final state. The assignment can be generalized to more general quantities in R<sup>∞</sup> ≥0 . We call such assignments f expectations [47] (think: random variable) and collect them in the set E, i.e.

$$\mathbb{E} = \left\{ f \; \middle| \; f \colon S \to \mathbb{R}\_{\geq 0}^{\infty} \right\} \; , \qquad \text{where} \qquad f \; \leq \; g \quad \text{iff} \quad \forall s \in S \colon f(s) \leq g(s) \; . \end{aligned}$$

is a partial order on E – necessary to sensibly speak about least fixed points.

Characteristic Functions. The expected behavior of a probabilistic loop for an expectation f is captured by an expectation transformer (namely the Φ: E → E of Sect. 2), called the loop's characteristic function. To focus on invariant synthesis, we abstract from the details<sup>10</sup> of constructing characteristic functions from probabilistic loops; our framework only requires the following key property:

Proposition 1 (Characteristic Functions). For every loop while ( ϕ ) { C } and expectation f, there exists a monotone function Φ<sup>f</sup> : E → E such that

$$\Phi\_f(I)(s) = \begin{cases} f(s), & \text{if } s \not\equiv \varphi \text{ ,} \\ \\ \text{``expected value of } I \text{ after exciting } C \text{ } once \text{ on } s\text{''}, \quad \text{if } s \equiv \varphi \text{ ,} \end{cases}$$

and the least fixed point of Φ<sup>f</sup> , denoted lfp Φ<sup>f</sup> , satisfies

 lfp Φ<sup>f</sup> (s) = "expected value of f after executing while ( ϕ ) { C } on s" .

<sup>9</sup> Considering only unsigned integers does not decrease expressive power but simplifies the technical presentation (cf. [16, Sect. 11.2] for a detailed discussion). We statically ensure that for every assignment x := e, e always evaluates to some value in N.

<sup>10</sup> We can (and our tool does) derive a symbolic representation of a loop's characteristic function from the program structure using a weakest-precondition-style calculus (cf. [47]); see [12, Appx. A] for details. If f maps only to 0 or 1, Φ<sup>f</sup> corresponds to the least fixed point characterization of reachability probabilities [7, Thm. 10.15].

Example 1. In our running example from Sect. 2.1, we chose as f the expression [fail = 10], which evaluates to 1 in every state s where fail = 10 and to 0 otherwise. The characteristic function Φ<sup>f</sup> (I) of the loop in Fig. 2 is

[¬ϕ] · [fail=10] + [ϕ] · 0.999 · I [sent/sent+1] [fail/0] + 0.001 · I [fail/fail+1] , where ϕ = sent < 8 000 000 ∧ fail < 10 is the loop guard and I [x/e] denotes the (syntactic) substitution of variable x by expression e in expectation I – the latter is used to model the effect of assignments as in standard Hoare logic. C

Inductive Invariants. For a probabilistic loop while ( ϕ ) { C }, and pre- and postexpectations g, f ∈ E, we aim to verify lfp Φ<sup>f</sup> g, i.e. that the expected value of f after termination of the loop is bounded from above by g. We discuss how to adapt our approach to expected runtimes and lower bounds in Sect. 6. Intuitively, f assigns a quantity to all target states reached upon termination. g assigns to all initial states a desired bound on the expected value of f after termination of the loop. By choosing g(s) = ∞ for certain s, we can make s so-to-speak "irrelevant". An I ∈ E is an inductive invariant proving lfp Φ<sup>f</sup> g iff Φ<sup>f</sup> (I) I and I g. Continuing our example, Eq. (2) on p. 5 shows an inductive invariant proving that lfp Φ<sup>f</sup> g := [fail = 0 ∧ sent = 0] · 0.9 + [¬(fail = 0 ∧ sent = 0)] · ∞.

Our framework employs syntactic fragments of expectations on which the check Φ<sup>f</sup> (I) I can be done symbolically by an SMT solver. As illustrated in Fig. 1, we use templates to further narrow down the invariant search space.

Templates. Let TVars = {α, β, . . .} be a countably infinite set of Q-valued template variables. A template valuation is a function I: TVars → Q that assigns to each template variable a rational number. We will use the same expressions as in our programs except that we admit both rationals and template variables as coefficients. Formally, arithmetic and Boolean expressions E and B adhere to

E −→ r | x | r · x | E + E B −→ E < E | ¬B | B ∧ B ,

where x ∈ Vars and r ∈ Q ∪ TVars. The set TExp of templates then consists of all

$$T = \begin{bmatrix} B\_1 \end{bmatrix} \cdot E\_1 + \dots + \begin{bmatrix} B\_n \end{bmatrix} \cdot E\_n \ , \ $$

for n ≥ 1, where the Boolean expressions B<sup>i</sup> partition the state space, i.e. for all template valuations I and all states s, there is exactly one B<sup>i</sup> such that I, s |= B<sup>i</sup> . T is a fixed-partition template if additionally no B<sup>i</sup> contains a template variable.

Notice that templates are generally not linear (over Vars ∪ TVars). Sect. 2 gives several examples of templates, e.g. Eq. (1).

Template Instances. We denote by T [I] the instance of template T under I, i.e. the expression obtained from substituting every template variable α in T by its valuation I(α). For example, the expression in Eq. (2) on p. 5 is an instance of the template in Eq. (1) on p. 5. The set of all instances of template T is defined as hTi = { T [I] | I: TVars → Q }. We chose the shape of templates on purpose: To evaluate an instance T [I] of a template T in a state s, it suffices to find the unique Boolean expression B<sup>i</sup> with I, s |= B<sup>i</sup> and then evaluate the single linear arithmetic expression E<sup>i</sup> [I] in s. For fixed-partition templates, the selection of the right B<sup>i</sup> does not even depend on the template evaluation I.

Piecewise Linear Expectations. Some template instances T [I] do not represent expectations, i.e. they are not of type S → R<sup>∞</sup> ≥0 , as they may evaluate to negative numbers. Template instances T [I] that do represent expectations are piecewise linear ; we collect such well-defined instances in the set LinExp. Formally,

Definition 1 (LinExp). The set LinExp of (piecewise) linear expectations is LinExp = {T [I] | T ∈ TExp and I: TVars → Q and ∀s ∈ S : T [I] (s) ≥ 0}.

We identify well-defined instances of templates in LinExp with the expectation in E that they represent, e.g. when writing the inductivity check Φ<sup>f</sup> (T [I]) ? (T [I]). Natural Templates. As suggested in Sect. 2.3, it makes sense to focus only on so-called natural templates. Those are templates that even have a chance of becoming inductive, as they take the loop guard ϕ and postexpectation f into account. Formally, a template T is natural (wrt. to ϕ and f) if T is of the form

$$T\_{\quad} = \underbrace{[\neg\varphi\land B\_1]\cdot E\_1 + \dots + [\neg\varphi\land B\_n]\cdot E\_n}\_{\text{must be equivalent to } [\neg\varphi]\cdot f} + \underbrace{[B'\_1]\cdot E'\_1 + \dots + [B'\_m]\cdot E'\_m \cdot f}$$

We collect all natural templates in the set TnExp.

Formal Problem Statement. Throughout this paper, we fix an ambient single loop while ( ϕ ) { C }, a postexpectation f ∈ LinExp, and a preexpectation g ∈ LinExp<sup>11</sup> such that lfp Φ<sup>f</sup> (I) g <sup>12</sup>. The set AdmInv of admissible invariants (i.e. those expectations that are both inductive and safe) is then given by

$$\mathsf{Admlnv} = \{ \underbrace{I \in \mathsf{LinExp}}\_{\mathsf{well-definedness}} \mid \underbrace{\Phi\_f(I) \preceq I}\_{\text{indentity}} \quad \text{and} \quad \underbrace{I \preceq g}\_{\text{safety}} \},$$

where the underbraces summarize the tasks for a verifier to decide whether a template instance I is an admissible inductive invariant. We require lfp Φ<sup>f</sup> g, so that AdmInv is not vacuously empty due to an unsafe bound g.

Formal problem statement: Given a natural template T, find an instantiation I ∈ hTi ∩ AdmInv or determine that there is no such I.

Notice that AdmInv might be empty, even for safe g's, because generally one might need more complex invariants than piecewise linear ones [16]. However, there always exists an inductive invariant in LinExp if a loop can reach only finitely many states.<sup>13</sup> We call a loop while ( ϕ ) { C } finite-state, if only finitely many states satisfy the loop guard ϕ, i.e. if S<sup>ϕ</sup> = { s ∈ S | s |= ϕ } is finite.

Syntactic Characteristic Functions. We work with linear expectations I, f ∈ LinExp, so that we can check inductivity (Φ<sup>f</sup> (I) I) symbolically (via SMT) without state space construction. In particular, we can construct a syntactic counterpart Ψ<sup>f</sup> to Φ<sup>f</sup> that operates on templates. Intuitively, whether

<sup>11</sup> To enable declaring certain states as irrelevant, we additionally allow E<sup>i</sup> = ∞ in the linear preexpectation g = [B1] · E<sup>1</sup> + . . . + [Bn] · En.

<sup>12</sup> We discuss in Sect. 6 how to reason about lower bounds g lfp Φ<sup>f</sup> (I).

<sup>13</sup> Bluntly just choose as many pieces as there are states.

we evaluate Ψ<sup>f</sup> on a (syntactic) template T and then instantiate the result with a valuation I, or we evaluate Φ<sup>f</sup> on the (semantic) expectation T [I] emerging from instantiating T with I – the results will coincide if T [I] is well-defined. Formally:

Proposition 2. Given while ( ϕ ) { C } and f ∈ LinExp, one can effectively compute a mapping Ψ<sup>f</sup> : TExp → TExp, such that for all T and I

> T [I] ∈ LinExp implies Ψ<sup>f</sup> (T) [I] = Φ<sup>f</sup> T [I] .

Moreover, Ψ<sup>f</sup> maps fixed-partition templates to fixed-partition templates.

In Ex. 1, we have already constructed such a Ψ<sup>f</sup> to represent Φ<sup>f</sup> . The general construction is inspired by [14], but treats template variables as constants.

### 4 One-Shot Solver

One could address the template instantiation problem from Sect. 3 in one shot: encode it as an SMT query, ask a solver for a model, and infer from the model an admissible invariant. While this approach is infeasible in practice (as it involves quantification over Sϕ), it inspires the CEGIS loop in Fig. 1.

Regarding the encoding, given a template T, we need a formula over TVars that is satisfiable if and only if there exists a template valuation I such that T [I] is an admissible invariant, i.e. T [I] ∈ AdmInv. To get rid of program variables in templates, we denote by T(s) the expression over TVars in which all program variables x ∈ Vars have been substituted by s(x).

Intuitively, we then encode that, for every state s, the expression T(s) satisfies the three conditions of admissible invariants, i.e. well-definedness, inductivity, and safety. In particular, we use Prop. 2 to compute a template Ψ<sup>f</sup> (T) that represents the application of the characteristic function Φ<sup>f</sup> to a candidate invariant, i.e. Φ<sup>f</sup> (T [I]) – a necessity for encoding inductivity.

Formally, we denote by Sat(φ) the set of all models of a first-order formula φ (with a fixed underlying structure), i.e. Sat(φ) = {I | I |= φ}. Then:

Theorem 1. For every natural template T ∈ TnExp and f, g ∈ LinExp, we have

$$
\langle T \rangle \cap \mathsf{Ad}\mathsf{ml}\mathsf{n} \mathbf{v} \neq \emptyset
$$

$$\text{iff} \quad \mathsf{Sat}\left(\forall s \in S\_{\varphi} \colon \underbrace{0 \le T(s)}\_{well \text{-}definedness} \land \underbrace{\Psi\_{f}(T)(s) \le T(s)}\_{indinity} \land \underbrace{T(s) \le g(s)}\_{safety}\right) \ne \emptyset \text{ .} $$

Notice that, for fixed-partition templates, the above encoding is particularly simple: T(s) and Ψ<sup>f</sup> (T)(s) are equivalent to single linear arithmetic expressions over TVars; g(s) is either a single expression or ∞ – in the latter case, we get an equisatisfiable formula by dropping the always-satisfied constraint T(s) ≤ g(s).

For general templates, one can exploit the partitioning to break it down into multiple inequalities, i.e. every inequality becomes a conjunction over implications of linear inequalities over the template variables TVars.

Example 2. Reconsider template T in Eq. (3) on p. 6 and assume a state s with s(fail) = 5 and s(sent) = 2. Then, we encode the well-definedness, T(s) ≥ 0, as

$$\left(5 < 10 \land 2 < \delta \Rightarrow \alpha\_1 \cdot 2 + \beta\_1 \cdot 5 + \gamma\_1 \ge 0\right) \land \left(5 < 10 \land 2 \ge \delta \Rightarrow \alpha\_2 \cdot 2 + \beta\_2 \cdot 5 + \gamma\_2 \ge 0\right)$$

where the trivially satisfiable conjunct 5 = 10 ⇒ true encoding the last summand, i.e. [fail = 10], has been dropped. C

The query in Thm. 1 involves (non-linear) mixed real and integer arithmetic with quantifiers – a theory that is undecidable in general. However, for finite-state loops and natural templates, one can replace the universal quantifier ∀s by a finite conjunction V s∈S<sup>ϕ</sup> to obtain a (decidable) QF LRA formula.

Theorem 2. The problem hTi ∩ AdmInv ? 6= ∅ is decidable for finite-state loops and T ∈ TnExp. If T is fixed-partition, it is decidable via linear programming.

### 5 Constructing an Efficient CEGIS Loop

We now present a CEGIS loop (see inner loop of Fig. 1) in which a synthesizer and a verifier attempt to incrementally solve our problem statement (cf. p. 9).

#### 5.1 The Verifier

We assume a verifier for checking I ? ∈ AdmInv. For CEGIS, it is important to get some feedback whenever I 6∈ AdmInv. To this end, we define:

Definition 2. For a state s ∈ S, the set AdmInv(s) of s-admissible invariants is

$$\mathsf{Admlv}(s) = \{ I \mid \underbrace{I(s) \ge 0}\_{s \text{-well-defined}} \quad \text{and} \quad \underbrace{\Phi\_f(I)(s) \le I(s)}\_{s \text{-induced}} \quad \text{and} \quad \underbrace{I(s) \le g(s)}\_{s \text{-safe}} \} \dots$$

For a subset S <sup>0</sup> ⊆ S of states, we define AdmInv(S 0 ) = T <sup>s</sup>∈S<sup>0</sup> AdmInv(s).

Clearly, if I 6∈ AdmInv, then I /∈ AdmInv(s) for some s ∈ S, i.e. state s is a counterexample to well-definedness, inductivity, or safety of I. We denote the set of all such counterexamples (to the claim I ∈ AdmInv) by CounterEx<sup>I</sup> . We assume an effective (baseline) verifier for detecting counterexamples:

Definition 3. A verifier is any function Verify : LinExp → {true} ∪ S such that


Proposition 3 ([14]). There exist effective verifiers.

For example, one can implement an SMT-backed verifier using an encoding analogous to Thm. 1, where every model is a counterexample s ∈ CounterEx<sup>I</sup> :

$$I \notin \mathsf{Admln} \\ \mathsf{v} \quad \text{iff} \quad \underbrace{\mathsf{Sat}\left(\neg\left(0 \le I \land \Phi\_f(I) \le I \land I \le g\right)\right) \ne \emptyset}\_{\exists s \in S \colon \ I \notin \mathsf{Admln}(s)} \quad\ . $$

Algorithm 1: Template-Instance Synthesizer for template T

1 S <sup>0</sup> ← ∅ ; <sup>2</sup> while Synt<sup>T</sup> (S 0 ) 6= false do <sup>3</sup> I ← Synt<sup>T</sup> (S 0 ) ; 4 result ← Verify(I) ; 5 if result = true then 6 return I ; /\* Verifier returns true, we have I ∈ AdmInv \*/ 7 S <sup>0</sup> ← S <sup>0</sup> ∪ {result} ; /\* result is a counterexample \*/ 8 return false ; /\* hTi ∩ AdmInv = ∅ \*/

#### 5.2 The Counterexample-Guided Inductive Synthesizer

A synthesizer must generate from a given template T instances I ∈ hTi which can be passed to a verifier for checking admissibility. To make an informed guess, our synthesizers can take a finite set of witnesses S <sup>0</sup> ⊆ S into account:

Definition 4. Let FinStates be the set of finite sets of states. A synthesizer for template T ∈ TnExp is any function Synt<sup>T</sup> : FinStates → hTi ∪ {false} such that


To build a synthesizer Synt<sup>T</sup> (S 0 ) for finite sets of states S <sup>0</sup> ⊆ S, we proceed analogously to one-shot solving for finite-state loops (Thm. 2), i.e. we exploit

$$T\left[\Im\right] \in \mathsf{Admlnv}(S') \quad \text{iff} \quad \Im \mid = \bigwedge\_{s \in S'} \underbrace{0 \le T(s) \land \Psi\_f(T)(s) \le T(s) \land T(s) \le g(s)}\_{T[\Im] \in \mathsf{Admlnv}(s)}.$$

That is, our synthesizer may return any model I of the above constraint system; it can be implemented as one SMT query. In particular, one can efficiently find such an I for fixed-partition templates via linear programming.

Theorem 3 (Synthesizer Completeness). For finite-state loops and natural templates T ∈ TnExp, we have Synt<sup>T</sup> (Sϕ) ∈ AdmInv or hTi ∩ AdmInv = ∅.

Using the synthesizer and verifier in concert is then intuitive as in Alg. 1. We incrementally ask our synthesizer to provide a candidate invariant I that is s-admissible for all states s ∈ S 0 . Unless the synthesizer returns false, we ask the verifier whether I is admissible. If yes, we return I; otherwise, we get a counterexample s and add it to S <sup>0</sup> before synthesizing the next candidate.

Remark 1. Without further restrictions, the verifier of Def. 3 may go into a counterexample enumeration spiral. In [12, Appx. C], we therefore discuss additional constraints that make this verifier act more cooperatively. C

### 6 Generalization to Termination and Lower Bounds

We extend our approach to (i) proving universal positive almost-sure termination (UPAST) – termination in finite expected runtime on all inputs, see [42, Sect. 6] – by synthesizing piecewise linear upper bounds on expected runtimes, and to (ii) verifying lower bounds on possibly unbounded expected values.

UPAST. We leverage Kaminski et al.'s weakest-precondition-style calculus for reasoning about expected runtimes [44,45]:

Proposition 4. For every loop while ( ϕ ) { C }, the monotone function

Θ: E → E, Θ(I)(s) = 1 + Φ0(I)(s) ,

obtained from Φ<sup>0</sup> (cf. Prop. 1) satisfies

 lfp Θ (s) = "expected number of loop guard evaluations when executing while ( ϕ ) { C } on s" .

All properties of Φ<sup>0</sup> relevant to our approach carry over to Θ, thus enabling the synthesis of inductive invariants I ∈ LinExp satisfying 0 I and Θ(I) I. Such I upper-bound the expected number of loop iterations [44] and, since expectations in LinExp never evaluate to infinity, I witnesses UPAST of the while-loop.

Lower Bounds. Consider the problem of verifying a lower bound g lfp Φ<sup>f</sup> for some loop C <sup>0</sup> = while ( ϕ ) { C }. It is straightforward to modify our CEGIS approach for synthesizing sub-invariants, i.e. I ∈ LinExp with I Φ<sup>f</sup> (I). However, Hark et al. [36] showed that sub-invariants do not necessarily lower-bound lfp Φ<sup>f</sup> ; they hence proposed a more involved yet sound induction rule for lower bounds:

Theorem 4 (Adapted from Hark et al. [36]). Let T be a natural template and I ∈ hTi. If 0 I, I Φ<sup>f</sup> (I), and C 0 is UPAST, then

$$\underbrace{\exists \, c \in \mathbb{R}\_{\geq 0} \,\forall \, s \in S\_{\varphi} \colon \quad \Phi\_f\left(|I - I(s)|\right)(s) \leq c}\_{I \text{ is conditionally difference bounded (c.d.b.)}} \quad \text{implies} \quad \quad I \preceq \text{ lfp } \Phi\_f\left(\ldots\right)$$

Akin to Prop. 2, given T ∈ TnExp, we can compute T <sup>0</sup> ∈ TnExp s.t. for all I,

T [I] ∈ LinExp implies T 0 [I] = λs. <sup>Φ</sup><sup>f</sup> |T [I] − T [I] (s)| (s) ,

which facilitates the extension of our verifier and synthesizer (see Sect. 5) for encoding and checking conditional difference boundedness. Hence, we can employ our CEGIS framework for verifying g lfp Φ<sup>f</sup> by (i) proving UPAST of C <sup>0</sup> as demonstrated above and (ii) synthesizing a c.d.b. sub-invariant I with g I.

### 7 Empirical Evaluation

We have implemented a prototype of our techniques called cegispro2<sup>14</sup>: CEGIS for PRObabilistic PROgrams. The tool is written in Python using pySMT [34]

<sup>14</sup> https://github.com/moves-rwth/cegispro2

Fig. 4: Performance of cegispro2 vs. state-of-the-art tools on three verification tasks (time in seconds, log-scaled; MO=8GB). Markers above the solid line depict benchmarks where cegispro2 is faster (in different orders of magnitude marked by the dashed lines).

with Z3 [49] as the backend for SMT solving. cegispro2 proves upper- or lower bounds on expected outcomes of a probabilistic program by synthesizing quantitative inductive invariants. We investigate the applicability and scalability of our approach with a focus on the expressiveness of piecewise linear invariants. Moreover, we compare with three state-of-the-art tools – Storm [39], Absynth [50], and Exist [9] – on subsets of their benchmarks fitting into our framework. Template Refinement. We start with a fixed-partition template T<sup>1</sup> constructed automatically from the syntactic structure of the given loop (i.e. the loop guard and branches in the loop body, see e.g. Eq. (1)). If we learn that T<sup>1</sup> admits no admissible invariant, we generate a refined template T2, and so on, until we find a template T<sup>i</sup> with hTii ∩ AdmInv 6= ∅ or realize that no further refinement is possible. We implemented three strategies for template refinement (including one producing non-fixed-partition templates); see [12, Appx. D] for details.

Finite-State Programs. Fig. 4a depicts experiments on verifying upper bounds on expected values of finite-state programs. For each benchmark, i.e. program and property with increasingly sharper bounds, we evaluate cegispro2 on all template-refinement strategies (cf. [12, Appx. D]). We compare explicit- and symbolic-state engines of the probabilistic model checker Storm 1.6.3 [39] with exact arithmetic. Storm implements LP-based model checking (as in Sect. 4) but employs more efficient methods in its default configuration. Fig. 4a depicts the runtime of the best configuration. See detailed configurations in [12, Appx. E.1].

Results. (i) Our CEGIS approach synthesizes inductive invariants for a variety of programs. We mostly find syntactically small invariants with a small number of counterexamples compared to the state-space size (cf. [12, Tab. 2]). This indicates that piecewise linear inductive invariants can be sufficiently expressive for the verification of finite-state programs. The overall performance of cegispro2 depends highly on the sharpness of the given thresholds. (ii) Our approach can outperform state-of-the-art explicit- and symbolic-state model checking techniques and can scale to huge state spaces. There are also simple programs where our method fails to find an inductive invariant (gridbig) or finds inductive invariants only for rather simple properties while requiring many counterexamples (gridsmall). Whether we need more sophisticated template refinements or whether these programs are not amenable to piecewise linear expectations is left for future work. (iii) There is no clear winner between the two fixed-partition template-refinement strategies (cf. [12, Tab. 2]). We further observe that the non-fixed-partition refinement is not competitive as significantly more time is spent in the synthesizer to solve formulae with Boolean structures. We thus conclude that searching for good fixed-partition templates in a separate outer loop (cf. Fig. 1) pays off.

Proving UPAST. Fig. 4b depicts experiments on proving UPAST of (possibly infinite-state) programs taken from [50] (restricted to N-valued, linear programs with flattened nested loops). We compare to the LP-based tool Absynth [50] for computing upper bounds on expected runtimes. These benchmarks do not require template refinements. More details are given in [12, Appx. E.2].

Results. cegispro2 can prove UPAST of various infnite-state programs from the literature using very few counterexamples. Absynth mostly outperforms cegispro215, which is to be expected as Absynth is tailored to the computation of expected runtimes. Remarkably, the runtime bounds synthesized by cegispro2 are often as tight as the bounds synthesized by Absynth (cf. [12, Tab. 3]).

Verifying Lower Bounds. Fig. 4c depicts experiments aiming to verify lower bounds on expected values of (possibly infinite-state) programs taken from [9]. We compare to Exist [9] <sup>16</sup>, which combines CEGIS with sampling- and MLbased techniques. However, Exist synthesizes sub-invariants only, which might be unsound for proving lower bounds (cf. Sect. 6). Thus, for a fair comparison, Fig. 4c depicts experiments where both Exist and cegispro2 synthesize sub-invariants only, whereas in Fig. 4d, we compare cegispro2 that finds sub-invariants only with cegispro2 that additionally proves UPAST and c.d.b., thus obtaining sound lower bounds as per Thm. 4. No benchmark requires template refinements.

<sup>15</sup> Absynth uses floating-point arithmetic whereas cegispro2 uses exact arithmetic.

<sup>16</sup> Exist supports parametric probabilities, which are not supported by our tool. We have instantiated these parameters with varying probabilities to enable a comparison.

Results. cegispro2 is capable of verifying quantitative lower bounds and outperforms Exist (on 30/32 benchmarks) for synthesizing sub-invariants. Additionally proving UPAST and c.d.b. naturally requires more time. A manual inspection reveals that, for most TO/MO cases in Fig. 4d, there is no c.d.b. sub-invariant. One soundness check times out, since we could not prove UPAST for that benchmark.

#### 8 Related Work

We discuss related works in invariant synthesis, probabilistic model checking, and symbolic inference. ICE [33] is a template-based, cex.-guided technique for learning invariants. More inductive synthesis approaches are surveyed in [4,29].

Quantitative Invariant Synthesis. Apart from the discussed method [9], constraint solving-based approaches [30,26,46] aim to synthesize quantitative invariants for proving lower bounds over R-valued program variables – arguably a simplification as it allows solvers to use (decidable) real arithmetic. In particular, [26] also obtains linear constraints from counterexamples ensuring certain validity conditions on candidate invariants. Apart from various technical differences, we identify three conceptual differences: (i) we support piecewise expectations which have been shown sufficiently expressive for verifying quantitative reachability properties; (ii) we focus on the integration of fast verifiers over efficiently decidable theories; and (iii) we do not need to assume termination or boundedness of expectations.

Various martingale-based approaches, such as [19,23,24,32,31,2,48], aim to synthesize quantitative invariants over R-valued variables, see [55] for a recent survey. Most of these approaches yield invariants for proving almost-sure termination or bounding expected runtimes. ε-decreasing supermartingales [19,20] and nonnegative repulsing supermartingales [55] can upper-bound arbitrary reachability probabilities. In contrast, we synthesize invariants for proving upperor lower bounds for more general quantities, i.e. expectations. [10] can prove bounds on expected values via symbolic reasoning and Doob's decomposition, which, however, requires user-supplied invariants and hints. [1] employs a CEGIS loop to train a neural network dedicated to learning a ranking supermartingale witnessing UPAST of (possibly continuous) probabilistic programs. They also use counterexamples provided by SMT solvers to guide the learning process.

The recurrence solving-based approach in [11] synthesizes nonlinear invariants encoding (higher-order) moments of program variables. However, the underlying algebraic techniques are confined to the sub-class of prob-solvable loops.

Probabilistic Model Checking. Symbolic probabilistic model checking focusses mostly on algebraic decision diagrams [6,3], representing the transition relation symbolically and using equation solving or value iteration [8,37,53] on that representation. PrIC3 [15] finds quantitative invariants by iteratively overapproximating k-step reachability. Alternative CEGIS approaches synthesize Markov chains [18] and probabilistic programs [5] that satisfy reachability properties.

Symbolic Inference. Probabilistic inference – in the finite-horizon case – employs weighted model counting via either decision diagrams annotated with probabilities as in Dice [41,40] or approximate versions by SAT/SMT-solvers [21,22,27,54,17]. PSI [35] determines symbolic representations of exact distributions. Prodigy [25] decides whether a probabilistic loop agrees with an (invariant) specification.

Data-Availability Statement The datasets generated during and/or analysed during the current study are available in the Zenodo repository [13].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

**Runtime Monitoring/Program Analysis**

## Industrial-Strength Controlled Concurrency Testing for C# Programs with Coyote

Pantazis Deligiannis1() , Aditya Senthilnathan<sup>2</sup> , Fahad Nayyar3? , Chris Lovett<sup>1</sup> , and Akash Lal<sup>2</sup>

> <sup>1</sup> Microsoft Research, Redmond, WA, USA {pdeligia,clovett}@microsoft.com <sup>2</sup> Microsoft Research, Bengaluru, India {t-adityase,akashl}@microsoft.com <sup>3</sup> Apple UK Ltd., London, UK f\_nayyar@apple.com

Abstract. This paper describes the design and implementation of the open-source tool Coyote for testing concurrent programs written in the C# language. Coyote provides algorithmic capabilities to explore the state-space of interleavings of a concurrent program, with deterministic repro for any bug that it finds. Coyote encapsulates multiple ideas from the research community to offer state-of-the-art testing for C# programs, as well as an efficiently engineered implementation that has been shown robust enough to support industrial use.

### 1 Introduction

Testing programs with concurrency is a challenging problem for developers. Concurrency introduces non-determinism in the program, making bugs hard to find, re-produce and debug [25,43]. In fact, concurrency is one of the main reasons behind flaky tests [34] (tests that may pass or fail without any code changes), causing a significant engineering burden on development teams [31]. As concurrency, in the form of multi-threading or distributed systems, is fundamental to how we build modern systems, solutions are required to help developers test their concurrent code for correctness.

There are two important challenges with testing concurrent programs. First is the problem of reproducibility or control. By default, a programmer does not have control over how concurrent workers interleave during execution.<sup>4</sup> The only programmatic control is through enforcing synchronization, but that is usually not enough to guarantee that certain interleavings can be reproduced. The second challenge is the state-space explosion problem. A concurrent program, even with a fixed test input, can have many possible behaviors; in fact, there can be exponentially many interleavings in terms of the length of the execution.

c The Author(s) 2023

<sup>?</sup> Work was done while the author was at Microsoft Research.

<sup>4</sup> Concurrency comes in many forms: threads, tasks, actors, processes, etc. We use the term workers to abstractly refer to any of these forms.

https://doi.org/10.1007/978-3-031-30820-8\_26 S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 433–452, 2023.

One line of work that attempts to solve these challenges is controlled concurrency testing (CCT) [53]. This approach proposes taking over the scheduling of concurrent workers and then using algorithms, either randomized or systematic, for searching over the space of interleavings. The former (i.e., taking over scheduling) is typically an engineering challenge. It requires understanding the language runtime and building solutions that are efficient, robust and usable. The latter (i.e., searching over the space of interleavings) requires algorithmic and empirical insights on finding bugs, and it has been the main topic of many research publications (e.g., [43,42,55,32,54,10,40,13,53,16,41,48,19,56]). Both these aspects are essential for industrial adoption.

In this paper, we describe the design and implementation of the open-source tool Coyote [7] for controlled concurrency testing of C# programs. Coyote aims to make testing of concurrent programs as easy and natural as testing of sequential programs.

Usage Coyote was released on GitHub on March 2020, and since then its release binaries have been downloaded from nuget.org over a million times. The project has extensive documentation as well as tutorials for developers [8]. Coyote has been used internally in Microsoft for testing multiple different services of the Azure cloud infrastructure. Through the use of lightweight telemetry [9], we have consistently seen over three million seconds of testing each month for the last 12 months, peaking at roughly 13 million seconds in a month. Coyote testing has been invoked 71K times per month on average, reporting around 10K test failures per month on average.

Coyote is also a testing backend for the P language [15], currently used in Amazon for the analysis of several core distributed systems [5]. A P program is compiled to a C# program and fed to Coyote for testing.

Contributions This paper covers the design decisions that were necessary for supporting industrial usage. It is unreasonable to support all programs in a language as broad as C#, so the focus of Coyote has been on the task asynchronous programming (TAP) model [38] that is the recommended and most common way of expressing concurrency and asynchrony in C#. Coyote encapsulates multiple state-space exploration techniques from the literature in order to provide state-of-the-art testing to its users. Coyote is also designed to be extensible, both in supporting other programming models (it already supports an actor programming model [4,12] and support for threads is straightforward), as well as other exploration strategies. This paper also describes a novel search technique specifically for TAP and its evaluation on industrial benchmarks.

Historical journey The origin of the Coyote code base can be traced back to an earlier system called P# [11] that defined a restricted (domain-specific) programming model for communicating state machines. The P# system has since then evolved into an actor framework that is still supported by Coyote, however Coyote itself has generalized to focus on TAP, making it a very different tool compared to P#. Prior work with Coyote has either focused on exploration strategies [48,40,39] or on applications [12,11,13], but not on the tool itself.

Coyote is useful for practitioners looking for industrial-strength tools (for C#), as well as researchers interested in evaluating new exploration algorithms for concurrency testing. This paper hopes to inspire and inform the reader towards contributing new ideas, features, and case-studies to Coyote.

### 2 The Coyote Tool

The C# task asynchronous programming (TAP) model revolves around the Task type that is used to encapsulate parallel computation. One can spawn a new task to execute in parallel with its parent, wait on an existing task to finish, or query for the result of a task once it has finished. Furthermore, the C# language offers async and await keywords that make it very convenient to write efficient (nonblocking) programs [37]. Similar features are also mainstream in other languages such as Rust, Python, Javascript and Go, and even C++ has support for them. Their semantics are fairly standard so we avoid them for space constraints, and instead just illustrate using an example.

Fig. 1 shows a typical concurrency test that we will use as a running example in this paper. The RunTest method creates two parallel tasks t1 and t2, waits for them to finish and asserts some condition. A programmer can run this test as-is with Coyote to find if the assertion can fail. There are two key points to note about this example. First, its behavior is interleaving dependent. The loop in SendMessages adds a string to the global list variable that is shared between the two tasks, so its final value will have a mix of strings of the form aN and bN, depending on the interleaving order. (This program has an unsynchronized access to list, but let us assume for simplicity that operations on List are atomic; in practice, one can guard these operations with locks). Second, while this code seemingly only has two tasks, at runtime it can have up to a 100 tasks created by the .NET runtime. The initial task created by SendMessages starts executing the async lambda code, but when it hits the await point, the runtime can (optionally) end the current task and spawn a new one to execute the rest of the code after the awaited expression finishes. (This "magic" happens when async methods get de-sugared by the C# compiler into state machines [52]. This transformation is what allows the code to be non-blocking.) Note that the await in this code can be hit 100 times (50 for each of the call to SendMessages). We will revisit the complexity imposed by such implicit tasks, both for the tool to take control (§4.1) and on space-space exploration later (§3.2); for now, we focus on the user experience.

Coyote use is illustrated in Fig. 2. After the user compiles their C# program containing one or more tests, they invoke the coyote rewrite command-line tool to rewrite their binaries. This automatic rewriting adds instrumentation to the original code to provide the necessary hooks and metadata for Coyote to control the (task-based) concurrency in the program (§3). Next, the user invokes the coyote test command-line tool to run their tests with the Coyote

```
List⟨string⟩ list = new ( ); Task SendMessages (string prefix) {
                                          return Task.Run (async ( ) => {
                                             for (int val = 0; val < 50; val++) {
                                                list.Add (string.Concat (prefix, val));
                                                await Task.Yield ( );
                                             }
                                           });
                                        }
async Task RunTest ( ) {
  Task t1 = SendMessages ("a");
  Task t2 = SendMessages ("b");
  await task.WhenAll (t1, t2);
  Assert.True (predicate (list));
}
```
Fig. 1: Example test code in C# with concurrency.

Fig. 2: Developer workflow when using Coyote.

test engine. The engine runs each test repeatedly for a user-specified number of iterations until a bug (failed assertion or unhandled exception) is found. The engine uses the instrumented hooks to intercept the execution of all workers in the test, and control them to allow only a single worker to execute at a time. The exact choice of which worker to enable in each step is left to an exploration strategy (§3.2).

When a bug is found, Coyote dumps out the sequence of all scheduling decisions taken in that test iteration. The user can replay the test failure using the coyote replay command, as many times as they like, with the C# debugger attached to step through the test deterministically.

Architecture, Extensibility The architecture of Coyote is illustrated in Fig. 3. The test engine exposes an instrumentation API used for declaring the concurrency, and synchronization, used in the program (§3). For task-based programs, the experience is seamless because the rewriting engine takes care of adding calls to this API automatically (§4). One can also add a custom runtime to Coyote. For instance, Coyote supports an actor-based programming model (to code at the level of actors instead of tasks) [12]. The actor runtime, in this case, performs the necessary calls into the Coyote test engine, again providing a seamless experience to users. For other programming models, say, a program using threads directly instead of tasks, these calls must either be inserted manually or a rewriting pass be added to Coyote to add these calls automatically for threads. Exploration strategies are also defined by a simple interface that makes it easy to implement multiple techniques.

The test engine is roughly 11K lines of C# code, the rewriting engine and the actor runtime are 12K lines each, and Coyote is overall 45K lines of code.

Fig. 3: The architecture of Coyote.

Coyote is heavily tested for robustness, with an additional 38K lines of code of unit tests.

Limitations, Requirements Coyote requires a test to be deterministic modulo scheduling between workers. This implies that, for instance, the program should not take a branch based on the current system time, or read data from an external service or a file that may change outside the scope of the test. Coyote also requires that tests be idempotent, that is, running the test twice has the same effect as running it once. This is because Coyote runs a test multiple times without re-starting the hosting process. Idempotence is easy to guarantee by avoiding static variables. Violating these requirements can imply that replay will fail. These are minor requirements, with users seldom complaining about them in our experience so far.

A more significant requirement is that Coyote be able to control all the concurrency created by a test. This may not happen when the program uses an unsupported programming model, or a library that cannot be rewritten because, say, it includes native code, which is outside the scope of coyote rewrite. Coyote has partial defenses against this: when it detects concurrent activity outside its control, it tries to tolerate it by letting it finish on its own (§5), else throws an error to make the user aware.

Coyote does not currently support the detection of low-level data races, i.e., unsynchronized memory accesses, which can indicate concurrency bugs. Race detection requires instrumentation at the level of individual memory accesses, which Coyote avoids for engineering simplicity and lower maintenance costs. (Coyote only instruments at the level of task APIs or synchronization operations.) Nonetheless, coyote rewrite is extensible, and the door is open for any contributor to take on this responsibility and implement race detection [22,49,23,51,50].

interface Instrumentation WorkerId OnWorkerCreated();

void OnWorkerStarted(WorkerId); void OnWorkerCompleted(WorkerId); void OnWorkerPaused(WorkerId, P); void ScheduleNextWorker(WorkerId); WorkerId GetCurrentWorkerId();

Fig. 4: The Coyote test engine instrumentation API.


Fig. 5: Example wrappers for task creation (left) and waiting (right) that call into the Coyote test engine.

### 3 Coyote Test Engine

#### 3.1 Instrumentation API

Fig. 4 lists the core instrumentation API that must be called from the user program to provide the Coyote test engine (CTE) with enough hooks for controlling its concurrency. CTE itself does not have a first-class understanding of TAP (or any programming model for that matter); all information about the program comes through this API, which allows us to keep CTE simple, and also allows easy addition of new programming models.

The instrumentation API takes inspiration from prior work [3] that demonstrated the generality of the API, even outside of C#, at capturing different programming models. Each worker created in the program must inform CTE when it is created (OnWorkerCreated), when it starts running (OnWorkerStarted), and when it completes (OnWorkerCompleted). A worker calls OnWorkerPaused with a predicate P to notify CTE that it has paused its execution and will become unblocked when P evaluates to true. For instance, when a worker pauses to acquire a lock, then P becomes true when the lock is released by some other worker. A worker calls ScheduleNextWorker to ask CTE to consider running a different worker. A worker calls GetCurrentWorkerId to ask CTE for its unique identifier.

Fig. 5 shows wrapper methods for task creation (Run) and waiting on the completion of a set of tasks (WaitAll). These methods implement the original semantics, but additionally call the instrumentation APIs to notify CTE. We show this only for illustrating the instrumentation APIs. In practice, the developer does not have to add these calls. §4 demonstrates how the Coyote binary rewriting engine automatically inserts these calls to cover the broad TAP programming model. An approach that creates a substitute method for each TAP method does not scale. For actor-based programs, the Coyote actor runtime takes care of calling the CTE without the need for binary rewriting.

Any time the program invokes CTE via one of these APIs (referred to as a scheduling point or step), CTE blocks the current worker, then looks at the list of workers that are enabled (by inspecting their pause-predicates, if any). It will then query the exploration strategy to select one worker from this list. The selected worker is unblocked (rest all workers remain blocked) and is allowed to execute until it hits a scheduling point again, at which point control goes into the CTE and the process repeats. This design, of sequentializing workers to execute only one-at-a-time is fairly standard in CCT tools [3].

#### 3.2 Exploration Strategies

Coyote decouples the concern of how to control workers from how to explore their interleavings. The latter is the responsibility of the exploration strategy, which is defined by a common interface. At its core, the interface has a single method that accepts a list of enabled workers and must return one of them. With most of the heavy lifting performed by CTE, exploration strategies are easy to implement; the largest one is only 400 lines of code. Furthermore, at the time the exploration strategy is invoked, all workers are in a blocked state (blocked by the CTE). Some strategies (like QL and POS; see below) require inspection of the program state. This can be done safely by the strategy without worrying about racing with the program's execution.

The random walk strategy (RW) picks an enabled worker uniformly at random in each step. This simple strategy has been shown to be effective in practice and argued as a necessary baseline for other strategies [53]. The PCT strategy [10] implements a priority-based scheduler. When a worker is created, it is assigned a new randomly-generated priority. At a scheduling point, PCT always picks the enabled worker that has the highest priority. In addition, at d times during an execution (called the bug depth parameter, which is supplied by a user-controlled configuration), PCT lowers the priority of the currently executing worker to be the smallest. These d priority lowering points are picked uniformly across the entire program execution. This priority-based nature helps PCT induce long delays in workers, unlike RW that switches back-and-forth between workers much more frequently.

Task-based PCT PCT was originally designed for multi-threaded programs. Later work observed its shortcomings for distributed systems and proposed the revised strategy called PCTCP [48]. We now discuss a novel adaptation of the idea behind PCTCP to TAP in a strategy called PCT<sup>t</sup> .

Consider again the program of Fig. 1. Let us define the function predicate to check that the string a49 does not appear before b0 in list. For the assertion in this program to fail, an interleaving must essentially execute t1 to completion before t2 gets a chance. The chance of RW producing this interleaving is tiny: around 1 in 2 <sup>50</sup>. If we imagine a thread-based scenario (ideal setting for PCT), where RunTest created two threads instead of tasks, then PCT (with d = 0) has 50% probability of hitting this bug. This is because if the first thread is assigned a higher priority, it will execute to completion before the second thread gets a chance to execute. However, PCT, with priorities-per-task, is unable to find this bug because of all the implicit tasks that get created at the await point (recall §2). Each time a new task is created, it gets a new randomly-generated priority. In effect, for this program, PCT behaves like RW.

PCTCP addresses this problem by constructing a partial order between workers, where two workers w<sup>1</sup> and w<sup>2</sup> are ordered if the programming model enforces that w<sup>2</sup> must only start after w<sup>1</sup> finishes. This partial order, constructed on-the-fly during program execution, is then decomposed into chains, which are totally-ordered subsets of the partial order. PCTCP then maintains priorities per chain, not per worker. When a new worker starts, it gets assigned to a chain (existing or a new one) and inherits the priority of the chain. PCTCP's effectiveness has only been demonstrated for distributed message-passing systems.

PCT<sup>t</sup> adapts the concept of chains for TAP. On the explicit creation of a task (using Task.Run), it gets assigned to a new chain (hence, it gets a randomlygenerated priority). If a task t yields control by executing Task.Yield, the continuation task is assigned to the same chain as t (hence, it inherits its priority). When a task t1 awaits another task t2 to complete, the continuation task of t1 is assigned to the chain of t2 because the continuation can only execute after t2 completes. (In reality, the continuation task is assigned to the chain of the task that completes t2, because t2 may have its own continuations created.) PCTt recovers the benefits of PCT; in our running example, only two chains are created, and it can find the bug with a 50% probability.

Other strategies Coyote also implements a strategy based on reinforcementlearning (QL) [40]. QL requires a partial hash (or fingerprint) of the program state and then learns a model that maximize the number of unique fingerprints seen during a test run. Increased coverage helps uncover more bugs. The partial order sampling (POS) strategy [56] uses information about which workers are racing with each other, i.e., they are about to access the same object (either a memory location or a synchronization object). POS uses a priority-based scheduler like PCT, but instead of lowering priority at d chosen points, POS keeps shuffling (i.e., re-assigning) priorities of racing workers at each step.

Other strategies available in Coyote are delay bounding (DB) [19] and variants of RW that use a biased coin. These strategies can also be combined either in the same test iteration (run one strategy for certain number of steps, then switch to running another strategy) or across iterations (pick a different strategy, in a round-robin fashion, for each iteration).

Data non-determinism Exploration strategies also offer a means to generate unconstrained boolean or integer values. Coyote exposes these APIs to developers, who can use them to express non-determinism in their program. An example is when testing for the robustness of a program against faults. In this case, the developer can non-deterministically choose to raise a fault (like an exception or return an error code) and check that their code can handle the fault correctly. Other examples are non-deterministically firing timeouts, non-deterministically choosing what method to call from a set of equivalent library methods, etc. Most exploration strategies resolve this non-determinism uniformly at random, with the exception of QL that tries to learn, alongside scheduling decisions, what return values are able to maximize program coverage.

Liveness checking In addition to catching safety violations (assertion failures and uncaught exceptions), Coyote can also check liveness properties where, essentially, one asserts that every program run eventually makes progress. The definition of progress is programmable, using the concept of liveness monitors (variant of deterministic Büchi automata) borrowed from the P modeling language [15]. A violation of a liveness property is an infinite run where no progress is made. Testing cannot produce an infinite run, so instead Coyote looks for a sufficiently long execution based on user-set thresholds [27,39]. Liveness properties are not rare. In fact, they are commonly asserted when testing distributed services to check that the service eventually completes every user request [12].

Any exploration strategy can be used for liveness checking, as long as it is fair, i.e., it does not contiguously starve an enabled worker for a long time. Unfairness can easily lead to liveness violations, but such violations are considered false positives because they cannot happen in practice as system scheduling is generally fair. RW is (probabilistically) fair, but PCT is not. Coyote converts unfair strategies to fair ones by running them up to a certain number of scheduling steps and then switching to use RW.

### 4 Automation for C# Task Asynchronous Programs

The style of instrumentation shown in Fig. 5 is not practical because there are many ways in which lambdas and tasks can be created (some return a result on completion, some do not, and there are optimized variants of tasks like ValueTask [45], etc.). Imposing directly on the creation process would be very cumbersome. One must also be able to handle both explicit creation of tasks, as well as the implicit creation that happens at await points. After much trialand-error, we arrived at an efficient solution that is simple and easy to maintain, even as C# itself evolves. We crucially rely on controlling task execution through a narrow lower layer of abstraction in the .NET runtime called the TaskScheduler [44]. We observed that whenever a task is created, it goes to the .NET default task scheduler, which is then responsible for executing the task on the .NET thread pool. This task scheduler can be subclassed, which we do as shown in Fig. 6 (right). Coyote.TaskScheduler offers a convenient place to call into the test engine, without requiring imposition on the creation of the task or its lambda. The job of rewriting then is to route tasks to this scheduler instead of the default task scheduler. We do this by defining simple wrapper methods for Task APIs, and rewriting the user C# binaries to call the wrapper methods instead of the original ones.

Fig. 6: Wrapper methods for Task APIs (left) and the implementation of the Coyote task scheduler (right).

Fig. 6 (left) illustrates static wrapper methods for Task.Run and Task.Wait. Notice that on TaskWrapper.Run, no modification to the lambda (func) is required. A task gets created as usual, then gets enqueued to the Coyote task scheduler, which, in turn, executes the task with appropriate calls to the test engine (ExecuteTask). This solution piggybacks on the RunInline functionality that the default scheduler also uses. The TaskWrapper.Wait method adds the call to OnWorkerPaused.

What about implicitly created tasks? This required more digging into the C# compiler to understand the compilation of async methods to state machines [52]. Fortunately, all we required is to identify the point where continuation tasks are created by these state machines, and instead call a wrapper method (similar to TaskWrapper.Run) that enqueues the task to the Coyote task scheduler.

### 4.1 Binary Rewriting for C# Tasks

Binary rewriting is necessary to provide a push-button experience for Coyote on TAP programs. In C#, code gets compiled into the Common Intermediate Language (CIL) [17], which is an object-oriented machine-independent bytecode language that can run on top of the .NET runtime in any supported operating system (Windows, Linux and macOS). Each compiled C# program consists of one or more CIL binaries. Each binary contains an assembly, which is a unit of functionality implemented as a set of types (these can be exposed publicly to be consumed by other assemblies). Each type might contain members such as fields and methods, and so on.

We implemented the binary rewriting engine on top of Cecil [46], an opensource .NET library that provides a rich API for rewriting CIL code. The rewriting engine architecture is illustrated in Fig. 7. The engine loads all program binaries from disk to access the CIL assemblies in-memory, topologically sorts them (to ensure that dependencies are processed first), and then traverses each assembly (using the visitor pattern) to apply a sequence of CIL rewriting passes, where each pass focuses on a different type of instrumentation.

Fig. 7: The architecture of the Coyote rewriting engine (left). The interface of a CIL rewriting pass (right).

Each rewriting pass implements the Coyote Pass interface, which is listed in Fig. 7. The rewriting engine visitor will traverse the CIL assembly and invoke the corresponding pass method for each encountered type, field, method signature, as well as each variable and instruction in each method body.

Built-in Rewriting Passes Coyote implements and invokes in-order the following four passes: type rewriting pass, task API rewriting pass, async rewriting pass, and inter-assembly invocation rewriting pass. The type rewriting pass is responsible for replacing certain C# system library types in the user program with corresponding drop-in-replacement types that are implemented by Coyote. The replacement types implement exactly the same interface as the original types, and invoke the original methods to maintain semantics, but are instrumented with callbacks to the Coyote test engine. Some examples of replaced types are: (1) System.Threading.Monitor type, which implements the lock statement in C#, and (2) the System.Threading.Semaphore type that is another variant of a lock. The Coyote versions of these types invoke the test engine to notify it when a worker acquires or releases a lock. These two are the synchronization primitives that Coyote supports by default, in addition to Task APIs. Adding support for more synchronization requires adding another type rewriting pass.

The task API rewriting pass inserts calls to the Coyote.TaskWrapper wrapper type, as discussed earlier. The async rewriting pass is similar, except for wrapping APIs that create implicit tasks. Finally, the inter-assembly invocation rewriting pass is responsible for identifying invocations in the code that are made across CIL assembly boundaries, where the target assembly is not rewritten by Coyote. Coyote adds instrumentation to detect (and tolerate) uncontrolled concurrency (see §5).

New passes that implement the Pass interface can be easily integrated in the current pipeline of passes, allowing power users to extend coyote rewrite for custom rewriting (e.g., to support controlling a new synchronization type without having to manually use the Coyote instrumentation API).

Design Considerations We decided to target CIL for instrumentation instead of doing it at the level of ASTs. This helps reduce the instrumentation scope because the CIL instruction set is much smaller than C# surface syntax. Furthermore, CIL changes infrequently (last update was in 2012 [17]), and we can target pre-compiled binaries without access to their source code.

### 5 Additional Features

Partially-Controlled Exploration As mentioned in §2, Coyote requires tests to be deterministic modulo the concurrency that it controls. This requirement can be broken when the test creates a worker without reporting it to the Coyote test engine, which impacts the ability of Coyote to reproduce an execution. This can happen when using APIs outside of the TAP programming model or by calling into a library that has not been rewritten. Partiallycontrolled exploration allows the controlled part of a program to be tested with high-coverage, even when interacting with an uncontrolled part. In fact, Coyote recommends to developers that they should only rewrite their test binaries as well as the binaries of their production code, but leave the binaries of any external dependencies unmodified (to be handled by partially-controlled exploration).

During partially-controlled exploration, Coyote will treat any un-rewritten binaries as "pass-through", and their methods are invoked atomically from the perspective of the tool. In this testing mode, Coyote sequentializes the execution of the controlled workers, as usual, and if a controlled worker invokes a method in an un-rewritten binary, or waits on a task that was earlier returned by a method from a non-rewritten binary, or invokes an unsupported low-level C# concurrency API, then Coyote detects this and invokes ScheduleNextWorker to explore a scheduling decision. Instead of immediately trying to choose a controlled worker to schedule, Coyote uses a (tunable) heuristic that gives a chance to wait for the uncontrolled task or invocation to first complete, before trying to resolve the scheduling decision. This is important because instead of regressing coverage, it allows Coyote to cover scenarios where completing the uncontrolled task or invocation first results in new states of the state space being available for exploration.

Setting max-steps Some tests can be potentially non-terminating, i.e., some executions of the test will go on forever. Non-termination comes naturally when a program has spinloops or polling loops (loops that keep going until some condition is met), or when they are unavoidable, as in consensus protocols like Paxos or Raft that cannot avoid the existence of infinite executions. coyote test provides the option of setting a bound on the length of a test iteration in terms of the number of scheduling points that it hits. This bound is supplied with the max-steps flag. The test engine keeps a count of the number of scheduling points in the current iteration. When it hits the max value, the test engine throws an exception in all of the workers (that would currently be blocked by the engine). This exception essentially kills the worker by propagating all the way up to the test harness, where it is caught by the engine. Once all workers are killed, the engine starts the next iteration.

This solution, of throwing an exception to kill a worker, only works when the worker does not catch the exception to try and resume the execution. All exceptions in C# must derive from the System.Exception type, and a construct like catch(Exception) will catch all exceptions. Coyote gets around this problem by using a binary rewriting pass that edits all catch statements to disallow catching of Coyote exceptions.

Thread-safety violations A thread-safety violation occurs in a program when it concurrently invokes some library API that is not designed to be thread safe. Prior work showed the prevalence of such errors in .NET programs when accessing data structures such as dictionaries and lists in the System.Collections.Generic namespace [33]. These data structures do not offer thread safe APIs. (In concurrent scenarios, one should instead use the data structures in System.Collections.Concurrent namespace.)

Coyote offers the ability to catch such errors. It implements a rewriting pass that replaces such a data structure, say Dictionary, with a drop-in replacement type WrapperDictionary. The latter keeps tracks of concurrent (write-write or write-read) accesses and throws an exception when there are two such simultaneous accesses. The exception causes Coyote to report a test failure.

Actor runtime Coyote offers a library, inspired from the P# [11] line of work, that allows a developer to use actors to express concurrency in their program. Actors, when created, run concurrently with respect to other actors. They continue to be alive unless explicitly halted. Each actor has an inbox where it listens for messages from other actors and processes them in a FIFO order. Several production systems have been build with Coyote's actor framework [12]. The actor runtime takes care of calling the test engine instrumentation APIs at the appropriate points, such as when creating an actor or sending a message to another actor. Hence, no rewriting is required. The Coyote test engine treats tasks and actors the same way, allowing a developer to freely mix the two programming models, i.e., test programs that use both actors and tasks.

### 6 Evaluation

Our evaluation covers three experiments, each on a different set of benchmarks. Each benchmark is a concurrent program with a known bug. We measure the effectiveness of Coyote by the number of times that it is able to hit the bug within a fixed number of test iterations. For each benchmark, we report its degree of concurrency (DoC), defined as the maximum number of simultaneously enabled workers, and the number of scheduling decisions (#SD), i.e., number of times the exploration strategy is invoked on average per test iteration.

The first experiment compares the performance of PCT<sup>t</sup> against PCT on task-heavy programs. We took a proprietary production service of Microsoft, which we call ProdService. The service runs as part of the Azure platform; it is roughly 54K lines of C#, and is designed to be highly-concurrent for high throughput. The owning engineering team were routinely running Coyote on multiple concurrency tests. We took an intermediate version of this service and



Table 2: Results from testing buggy protocol implementations. Number of test iterations was set to 10K, except for FailureDetector and Paxos that used 100K iterations. PCT, PCT<sup>t</sup> and DB use the bound d = 10.


ran all tests with RW, PCT and PCT<sup>t</sup> , each with 1000 iterations each. There were a total of 111 tests, out of which 21 tests reported a failure (i.e., bug) with some strategy. The comparison is shown in Table 1. (We actually ran both PCT and PCT<sup>t</sup> with multiple different values of the d parameter, and selected the best among them for each strategy; this value turned out to be d = 10 for both.)

Table 1 shows superior performance of PCT<sup>t</sup> . It is able to find 17 test failures, compared to 13 for PCT and 9 for Random. Furthermore, on tests that failed with both PCT and PCT<sup>t</sup> , the latter found the bug 9 times more often (geo mean). We observe that these tests created many tasks, roughly 277 tasks (geo mean) in each test iteration, which throws off PCT. With PCT<sup>t</sup> , the number of chains was 6 times smaller (geo mean). Running these 21 tests for 1000 iterations each takes roughly 50 min (wall clock) on a 16 core AMD EPYC (2.6Ghz) VM, running Ubuntu 20.04 on Azure, when utilizing 14 threads on the machine to run tests in parallel.

The second experiment is on buggy protocol implementations from prior work [48,40], shown in Table 2. This experiment evaluates a wider range of strategies. Three schedulers (PCT, PCT<sup>t</sup> and DB) find all the bugs, but none is a clear winner. A combination of schedulers is likely required for reliably finding bugs in a small number of iterations.

The final experiment is to show that Coyote is indeed state-of-the-art by comparing against other tools. We did not find any other CCT tool for C#, so we instead took an established benchmark suite SCTBench [53] of C/C++ programs that use pthreads for concurrency, and manually ported some of them to C# (Table 3), replacing pthreads APIs with Task APIs. These benchmarks have potentially racy shared variables, so we implemented an experimental bi-



nary rewriting pass in Coyote that adds scheduling points on heap accesses, to ease the porting exercise. A direct comparison with prior tools is difficult because there can still be subtle differences in how scheduling points get inserted. Regardless, we note that numbers for POS are roughly in agreement with its original paper [56] and numbers for PCT and RW are in agreement with a prior empirical study [53]. (Note that PCT<sup>t</sup> is identical to PCT on these benchmarks because there are no task continuations.) Our implementation of POS performs better than the original one, but the original implementation is unavailable for us to make a more accurate assessment. This comparison is useful to ground Coyote with respect to related work.

The code and scripts to run all the non-proprietary experiments from this paper are available as an artifact on Zenodo [14].

#### 7 Related Work

The term controlled concurrency testing (CCT) was coined only recently [53] but it inherits its roots from stateless model checking (SMC) that was popularized by VeriSoft [24]. Stateful approaches require the ability to record the state of an executing program; this is hard to achieve for production code, consequently stateful checking tools [26,6] are often applied to models of code that are written in custom languages. SMC/CCT, on the other hand, only record the sequence of actions taken during an execution, making them the technique of choice for directly testing code written in commercial languages (like C#).

Research in SMC/CCT can further be classified in two categories. One category is of exhaustive techniques, where the goal is to explore the entire statespace of a program (in reality, it is the state-space of a fixed test that invokes a bounded workload on the program), and obtain a verified verdict. Exhaustive techniques are based on the notion of partial order reduction (POR) [24] that constructs equivalence classes of executions so that only one exploration per

equivalence class is required [35]. Recently, this line of work has produced several tools, such as CDSChecker [47], GenMC [30], and Nidhugg [2], that have demonstrated value in verifying concurrency primitives (e.g., latches, mutex implementations) and concurrent data structures, especially when considering weak memory behaviors [1,28,29].

The other category for SMC/CCT are techniques aimed towards bug-finding. These techniques are either bounded (i.e., aim to explore only a subset of the executions) or randomized or both. By lowering expectations (i.e., not insisting on covering the entire state-space), these techniques can be applied on larger systems. We have discussed several instances of these techniques throughout this paper. The first work that popularized bug-finding was the notion of contextbounded exploration [41]. Coyote borrows heavily from this line of work on bug-finding techniques, which is evident in the set of exploration strategies that it supports. Implementing POR-based strategies is possible; the POS strategy already takes Coyote in this direction. The absence of exhaustive techniques has (so far) not been felt by users of Coyote, likely because the usage scenarios have neither focused on weak memory behaviors (more present in C/C++ rather than C#), nor on verifying concurrent data structures. Nonetheless, supporting POR-based techniques remains an important direction for future work.

Related to the idea of CCT for bug-finding are noise-injection-based techniques [21,20,18]. These techniques rely on perturbing the execution of a concurrent program by injecting noise such as sleep statements, which force the execution to explore alternative interleavings. Unlike CCT, no control is required on concurrent workers, hence these techniques have simpler engineering requirements. However, the tradeoff is that the loss of control implies that the ability to explore specific interleavings, such as what PCT requires, is reduced. The ANaConDA tool has successfully demonstrated noise-injection in an industrial setting [21]. It can be interesting to explore the use of noise injection to provide coverage in portions of code that are not controlled by Coyote.

The CHESS tool [41], to the best of our knowledge, was the only other CCT tool to support C#. CHESS is currently not in a usable state. It was designed prior to the popularity of TAP in C#, thus had no special support for tasks. In terms of implementation, it occupied a different design space than Coyote. It relied on interception of C# threading APIs and redirecting them to custom mocks. Maintenance of these mocks was an engineering cost. Furthermore, the interception technology relied on a framework [36] that also went out of support. This showcases that the complexity of supporting C# must be met with good engineering, built on stable frameworks. Coyote is also more extensible, both in terms of programming frameworks, as well as exploration strategies.

Acknowledgements The authors would like to thank everyone who has contributed to Coyote over the years. This includes many open-source contributors that have filed issues and fixes, as well as developers that have integrated Coyote into their engineering process to provide valuable insights on what concurrency testing can and should do. We would especially like to thank Immad Naseer for his help with ProdService.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Context-Sensitive Meta-Constraint Systems for Explainable Program Analysis**

Kalmer Apinis() and Vesal Vojdani

Institute of Computer Science, University of Tartu, Narva mnt 18, EE-51009 Tartu, Estonia {kalmera,vesal}@ut.ee

**Abstract.** We show how to generate a constraint system of symbolic expressions as part of an inter-procedural constraint-system–based program analysis such that any chosen slice of the intended analysis may be computed through the evaluation of the symbolic constraints. Thus, our method ensures that the computed expressions provide genuine explanations for the chosen analysis slice.The resulting system is then annotated with program location information, translated into closed-form expressions, and simplified to yield a human-readable justification for the analyzer's verdict. Justifications are given using program locations, constants from the program, abstract lattice operations, loops in the analysis, and computed results.

**Keywords:** Program analysis, Data-flow analysis, Constraint systems, Abstract domains, Explainability

### **1 Introduction**

When a program analysis tool identifies a flaw in the program, it is often possible to produce a counterexample execution trace that is useful for debugging the program. As noted by the founders of model checking, "it is impossible to overestimate the importance of this feature" [13]. In contrast, when a sound analyzer verifies the absence of errors, it does not produce an equivalent human-readable artifact to explain this verdict. The challenge is to explain why a property holds along *all* possible executions of the program in a way that is understandable to users of the tool.

A simple example of explaining an invariant is seen in Fig. 1, where the code inspection of IntelliJ IDEA explains the reason for a boolean guard being always false. This is elegant, and we aim to generalize this idea to explain verdicts that rely on inductive invariants. IntelliJ does not explain more complicated analyses than simple constant propagation.

Fig. 1: Explanation in IntelliJ IDEA.

© The Author(s) 2023

https://doi.org/10.1007/978-3-031-30820-8\_27 S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 453–472, 2023.

Fig. 2: Explanation generated for Interval Analysis by the Põder analyzer.

The usability aspects of sound static analyzers deserve more research attention, especially as decades of work have been put into the more technical aspects of analysis theory and tool design. Empirical studies suggest that poor explainability of analysis results is as serious an obstacle as false positives in preventing the wider adoption of static analysis tools [12, 22, 28]. We take a first step in this direction by providing a general framework for explaining abstract interpreters. We then instantiate this framework to generate explanations for interval analysis with widening and narrowing iterations. A prototype implementation of our approach is avaliable in the static analysis framework Põder<sup>1</sup> . In the Fig. 2 we see the results of Põder analyzing a Java bytecode program. On the right-hand side, the solved interval value of the field x on line 17 is shown together with reasoning on how the value was computed. The example program and its (interval) analysis is explained in Section 4 (Example 1); the justification is explained in Section 6.

Explanations for simple analyses can be useful in practice. In our previous work on static analysis for Linux device drivers [34], we spent countless hours determining why the analyzer claimed that a portion of the code is definitely unreachable. Rather than relying on ad-hoc methods to trace the computation of the analyzer, we aim to build an analyzer with explainability as a core consideration. We identify two desirable functional requirements that a framework for explainable static analysis should satisfy:

**Result consistency.** Computing explanations should not influence the result of the actual analysis.

<sup>1</sup> Avaliable via artifact[6] or bitbucket: https://bitbucket.org/kalmera/poder.

**Explanation Consistency.** The explanation should be semantically consistent with the result of the analysis.

The key contribution of this paper is a framework for explainable analysis that prioritizes explanation consistency. The analysis will operate with symbolic expressions, which can be directly translated into explanations, and crucially, the result of the analysis is based on evaluating these expressions. This ensures explanation consistency by construction.

The proposed method fits into the framework of A<sup>2</sup> I (also called meta-abstract interpretation) described by Cousot et al. [18]. A simplified view of A<sup>2</sup> I is that the analysis is divided into two instances of abstract interpretation: a metaanalysis and an underlying-analysis. The benefit of this approach is that one can reason about the soundness of the meta-analysis with the same formalism as the analysis itself. In our case, the meta-analysis generates analysis expressions and the underlying analysis evaluates them.

*The structure of the paper.* We introduce the formal setting in Section 2, and give abstract definitions for explainable analysis in Section 3. The main contribution is introduced via an example in Section 4 — transforming interval analysis to additionally gather interval expressions. Several examples are presented. The post-processing of generated expressions into closed form is shown in Section 5. In Section 6, we discuss our prototype implementation in Põder. Next, in Section 7 we discuss limitations of the current implementation and possibilities for applying our approach in various settings. Related work is described in Section 8, after which we conclude.

### **2 Data-flow Analysis**

A *program* is a set of functions Fun containing main. Each function *f* P Fun is represented by its *Control Flow Graph* (*N<sup>f</sup>* , *E<sup>f</sup>* , *fbegin*, *fend*) where *N<sup>f</sup>* is a finite set of program points and *E<sup>f</sup>* Ď *N<sup>f</sup>* ˆ *L* ˆ *N<sup>f</sup>* is the set of labeled edges p*u, l, v*q. Each function has a unique source *fbegin* and a unique sink *fend*. The label set *L* represents program statements including (but not limited to) function calls as well as conditional guards. We assume that CFG nodes of distinct functions are distinct, so we can leave out subscripts from *N<sup>f</sup>* and *E<sup>f</sup>* .

A complete lattice p*D,* Ďq is a partial order that for each set *D*<sup>1</sup> Ď *D* has a least upper bound Ů *D*1 [9]. We know that any complete lattice must have a unique least element K :" Ů H and a unique greatest element J :" Ů *D*.

A constraint system is a set of variables *V* where each variable *v* P *V* may be constrained using pĎq by an expression *f<sup>v</sup>* over variables *V* . The expression *f<sup>v</sup>* is formalized as a function p*V* Ñ *D*q Ñ *D*. A (partial) mapping *σ* : *V* Ñ *D* is a (partial) solution to a constraint system if for all variables *v* in the domain of *σ* it holds that *σ*p*v*q Ě *fv*p*σ*q.

Let *S* denote the set of all possible concrete program states. We can formulate the collecting semantics for the set of states reachable by the program, using the functional approach [33] to include states reachable through interprocedurally valid paths only. The constraint system variable r*v, d*s consists of a program point *v* P *N<sup>f</sup>* together with sets of program states *d*, representing states at the beginning of the function *f*. For each *d* P 2 *<sup>S</sup>* " *D* we have constraints:

$$\begin{aligned} \left[f\_{begin},d\right] & \supseteq d & \forall f \in \mathsf{Fun} \\ \left[v,d\right] & \supseteq \left[\epsilon\right](\left[u,d\right]) & \forall e = \left(u,l,v\right) \in E \\ \left[v,d\right] & \supseteq \bigcup\_{d' \in \left[u,d\right]} \mathsf{comb}\_f(d', \left[f\_{end}, \mathsf{enter}\_f(d')\right]) & \forall e = \left(u,x := f(.,.),v\right) \in E \end{aligned}$$

The value of a constraint system variable r*fbegin, d*s is the set of states that reach the beginning of *f* with the assumption that the start of *f* can be reached in states *d*. Thus, the first constraint is trivial. For non–function-call edges a distributive transfer function v¨w : *E* Ñ p2 *<sup>S</sup>* Ñ 2 *<sup>S</sup>*q is applied which translates labels to transformations of program state sets. For calls to *f* P Fun in edges *e* " p*u, x*:"*f*p*. . .*q*, v*q P *E* two distributive functions are used: enter*<sup>e</sup>* and comb*e*. First, caller states are translated to callee starting states using enter*<sup>e</sup>* : *S* Ñ *S*, and then caller states together with called function end-states are translated to returning states using comb*<sup>e</sup>* : *S* Ñ 2 *<sup>S</sup>* Ñ 2 *S*.

Given the least partial solution *σ*, the set of reaching states for each program point *u* is the union of values of *σ*r*u, d*s that rmain*end, d*0s (recursively) depends on. Note that we prefer partial solutions over total solutions as we want to avoid unreachable contexts. Thus, we have for each CFG node the set of program states that this node may be reached with. We have proven partial correctness if erroneous states can not be reached. Reachable program state sets, however, are in general not practically computable. So instead of sets of states, we use a different complete lattice so that a single abstract value describes a whole set of concrete program states.

The correspondence of program states and the chosen complete lattice elements is formalized using a *description relation* p∆q Ă *S* ˆ *D* [32], i.e., we write *s* ∆ *d* if the program state *s* P *S* is described by the abstract state *d* P *D*. We require that the least element K should not describe any program state and the greatest element J must describe all concrete program states. The description relation must also reflect the ordering of the lattice: *s* ∆ *d*<sup>1</sup> ^ *d*<sup>1</sup> Ď *d*<sup>2</sup> ùñ *s* ∆ *d*2. For sound analysis we require an abstract version of semantics function that agrees with concrete semantics:

$$\begin{aligned} s \, \bigtriangleup d \wedge s' \in [e](\{s\}) &\implies s' \, \bigtriangleup [e]^\sharp(d) \\\ s \, \bigtriangleup d \wedge s' \, \bigtriangleup d' \wedge s'' \in \mathsf{comb}\_f(s, s') &\implies s'' \, \bigtriangleup \mathsf{comb}\_f^\sharp(d, d') \\\ s \, \bigtriangleup d &\implies \mathsf{enter}\_e(s) \, \bigtriangleup \mathsf{enter}\_e^\sharp(d) \end{aligned}$$

For non-recursive programs, the most precise partial solution is computable if *D* does not contain infinite ascending chains. In the case of ascending chains, we can find a partial solution that is not necessarily the most precise [4]. Either

way, any partial solution is a sound over-approximation of the collecting semantics. Thus, we have proven partial correctness if the computed partial solution does not contain any abstract state that describes a concrete error state.

As a side-note, if program graphs contain an equivalent of dynamic goto instructions, full CFG-s might be impractically large. Then it is advantageous to explore the CFG lazily starting from main*begin*, for example, using the function next : *N* Ñ *D* Ñ 2 *<sup>L</sup>*ˆ*<sup>N</sup>* that gives for each node *n* and abstract state *d* the set of reached nodes and their corresponding edge labels from *n* [33]. For manageablesized CFG-s, an off-the-shelf local solver can also be used in practice to produce the partial solution [2, 31].

#### **3 Meta-analysis for explanations**

Data-flow analysis with finite number of constraint system variables can be succinctly formalized as a single vectorized constraint

$$
\bar{x} \supseteq F\_p^\sharp(\bar{x}) \tag{1}
$$

where a post-fixpoint of *F* 7 *p* contains true statements about the program *p*. In general, there is no easy way to succintly explain how a member of the vector *x*¯ is computed without expensive inspection of the function *F* 7 *p* . Instead, we propose to apply meta-abstract interpretation and split the analysis into two constraints

$$\begin{aligned} \bar{y} &\supseteq G\_p^\sharp(\bar{x}) \\ \bar{x} &\supseteq E^\sharp(\bar{x}, \bar{y}) \end{aligned} \tag{2}$$

where, first, the function *G*<sup>7</sup> *<sup>p</sup>* generates expressions and, second, the function *E*<sup>7</sup> evaluates the generated expressions.

**Definition 1 (Result & Explanation Consistency).** *Let x*¯<sup>1</sup> *be a solution of System 1, and* p*y*¯2*, x*¯2q *be a solution of System 2. We say the result is consistent iff x*¯<sup>1</sup> " *x*¯2*. And the explanation is consistent iff x*¯<sup>2</sup> " *E*<sup>7</sup> p*x*¯2*, y*¯2q*. Jointly, these properties ensure that y*¯<sup>2</sup> *is a valid explanation for the computation of x*¯1*.*

The functions *G*<sup>7</sup> *<sup>p</sup>* and *E*<sup>7</sup> must be implemented in a way that guarantees *result consistency* — explanation consistency is guaranteed by construction for the least solution. As the resulting construction is in the form of a constraint system, it may be combined with other constraint system based analyses into a single constraint system. Thus, the analysis designer can choose to generate explanations about the (sub-)analysis where it is considered beneficial.

One standard example of a complete lattice is the box domain — a mapping from program variables to integer intervals. For this domain, the analysis produces bounds for integer variables that may be used to warn the user if array accesses are not within bounds. In practice, however, programs use dynamic language features such as function pointers, dynamic memory, multi-threading, etc.

and more information must be stored in the domain than just intervals. Thus, we should also show how interval analysis can use and provide information to other analyses. In the next section, we propose a process to modify an analysis to that effect.

### **4 Transforming the box domain**

We start with a functional approach [33] analysis where the domain *D* consists of an arbitrary "helper" analysis domain *H* and the box domain.

$$D = H \times (\text{Var} \to \text{I})\_\perp$$

The box domain can either be K, meaning that the program point is not reachable, or a function that maps program variables to inclusive integer intervals r*a, b*s. The lower (upper) bound *a* (*b*) can be an integer or negative (positive) infinity. We can assume that the lower bound is not larger than the upper bound. The lattice order is defined pointwise with the exception that K is the least lattice element. For any context *d* P *D*, the constraint for the analysis are the following:

$$\begin{aligned} \left[f\_{begin},d\right] & \rightleftharpoons d & \quad \forall f \in \mathsf{Fun} \\ \left[v,d\right] & \supseteq \left[e\right]^\sharp (\left[u,d\right]) & \forall e = \left(u,l,v\right) \in E \\ \left[v,d\right] & \supseteq \mathsf{comb}^\sharp\_e(\left[u,d\right],\left[f\_{end},\mathsf{enter}^\sharp\_e(\left[u,d\right])\right]) & \forall e = \left(u,x := f(\cdot,\cdot),v\right) \in E \end{aligned}$$

First, the starting point of the function is constrained by the value in the context. The second and third constraints deal with non-call edges and function call edges, respectively. The constraint system is analogous to the constraint system for concrete semantics with the exception that the argument of enter<sup>7</sup> and the first argument of comb<sup>7</sup> represent state sets instead of one particular state.

Fig. 3: Interval analysis of a program with a loop.

*Example 1.* Consider the Java method foo in Fig. 3a and 3b. First, two fields are initialized to the value 0. The field x is incremented in the loop, but the field y is left as is. At the end, field values are printed using the evalInt method. We assume that the helper analysis may conclude that the object pointed to by this will not be visible to other threads. Until a reference to the object escapes to other threads, we can be sure that no other access to these fields can happen during the call to foo. Solving steps are shown in Fig. 3c, where the abstract values of fields x and y at program point *m* are referred to as *x <sup>m</sup>* and *y <sup>m</sup>*, respectively.

Thanks to the helper analysis, we know that the object pointed to by this will not be visible to other threads and, thus, we may consider fields x and y as local variables. At the final node, the value of x is 100 and y is zero. Bold font in Fig. 3c emphasizes a change to the solver variable. The analysis uses widening and narrowing [15] to reach the least partial solution in nine steps.

For novice program analysis tool users, seeing only the final result, it might not be clear how values for x and y are derived. Other users might complain that for iterations 4 to 7, the values for y and *H* are re-computed unnecessarily. In the following sections, we aim to remedy such issues.

#### **4.1 A naive approach to adding expressions**

To add interval expression information for each program point we, instead of *D*, use the domain *D*<sup>1</sup> consisting of the helper analysis domain, interval expressions, and interval values.

$$D' = H \times (\text{Var} \to \mathbb{E}^{\sharp})\_{\perp} \times (\text{Var} \to \text{I})\_{\perp}$$

For abstract expressions we use values in the form joinp*S*q P E <sup>7</sup> where *S* P 2 *E* is a set of expressions defined using the following grammar:

$$\mathcal{E} \ddot{\coloneqq} = \left[ N, D', \text{Var} \right] |\mathcal{F}(\mathcal{E}^\*) \mid \top$$

The ordering is defined as joinp*X*q Ď joinp*Y* q :" J P *Y* \_ *X* Ď *Y* . The variable r*n, d, x*s (written as *x <sup>n</sup>* in the examples where the context can be inferred) refers to the value of the program variable *x* in program point *n* in context *d*. Furthermore, an expression can be unknown (J) or an *n*-ary function from the set *F* together with its argument expressions. It is assumed that *F* contains interval constants as nullary functions. The expression mapping *k* P Var Ñ E 7 can be evaluated for each variable evaluation *ρ* P *N* ˆ *D*<sup>1</sup> ˆ Var Ñ *I* using v*k*w 7 *E* p*ρ*q " *λx .*v*k*p*x*qw<sup>7</sup> *E* p*ρ*q where

$$\begin{aligned} \left[\operatorname{join}(S)\right]\_{\mathcal{E}}^{\sharp}(\rho) &= \bigsqcup \{ \operatorname{\boldsymbol{s}} \big|\_{\mathcal{E}}^{\sharp}(\rho) \mid \boldsymbol{s} \in S \big\} \\ \left[\top\right]\_{\mathcal{E}}^{\sharp}(\rho) &= \left[-\infty,\infty\right] \\ \left[\![f(s\_1,\ldots,s\_n)]\right]\_{\mathcal{E}}^{\sharp}(\rho) &= f(\left[s\_1\right]\_{\mathcal{E}}^{\sharp}(\rho),\ldots,\left[s\_n\right]\_{\mathcal{E}}^{\sharp}(\rho)) \\ \left[\![\boldsymbol{n},\boldsymbol{d},\boldsymbol{x}]\right]\_{\mathcal{E}}^{\sharp}(\rho) &= \rho(\boldsymbol{n},\boldsymbol{d},\boldsymbol{x}) \end{aligned}$$

This analysis can be implemented directly using the functional approach, i.e., the previously discussed constraint system with the domain *D*<sup>1</sup> instead of *D*.



Fig. 4: Analysis of program in Fig. 3a using the domain *D*<sup>1</sup> .

*Example 2.* When analyzing the program form Fig. 3a using the domain *D*<sup>1</sup> , we get the iterates shown in Fig. 4. The interval values stay the same w.r.t. analysis using *D*. In addition, we obtain an interval constraint system for integer program variables. The unknown *x <sup>n</sup>* signifies the interval state of program variable *x* in the program point *n*. Note that without the helper analysis we would need to handle potential write operations from other threads. In general, we have gathered the information on how interval values are computed at each step, but the overview is still lacking. As the expressions refer to several variables, the correspondence and correctness may not be immediately apparent. Also, note that we have increased the amount of unnecessary re-computation (in non-bold font).

#### **4.2 A more sophisticated approach to adding expressions**

The naive approach has two downsides which we aim to overcome. First, we tackle the issue that a buggy analysis may output inconsistent expressions and interval values. Furthermore, a function would be analyzed for each expression and value at the start point, i.e., context, not only for each distinct value. This is excessive as the analyzed program can only access the numeric value — not the way values were computed — and therefore cannot behave differently based on it. Thus, we only store interval values as the context so that the expressions at the start of the function will have literal values.

We use three kinds of constraint system variables instead of triples to reduce unnecessary re-computation. First, helper analysis variables r*u, d*s<sup>1</sup> with values from the domain *H* which corresponds to the first components of r*u, d*s. Second, expression map variables r*u, d*s<sup>2</sup> with values from the domain pVar Ñ E 7 qK, and finally, interval map variables r*u, d*s<sup>3</sup> with values from the domain pVar Ñ IqK. Interval values are computed from interval expression by evaluation as follows; thus guaranteeing that they will agree w.r.t. the solution.

$$[[v,d]\_3 \equiv \lambda x. \; [[v,d]\_2(x)]\_\mathcal{E}^\sharp(\lambda(u,d',y). [u,d']\_3(y)) \qquad v \in N \land d \in D$$

The constraints for non-function-call labels for any *d* P *D* are as follows

$$\begin{aligned} & [f\_{beginsin}, (h, k)]\_1 \sqsupset h & \forall f \in \mathsf{Fun} \\ & [f\_{begin}, (h, k)]\_2 \sqsupset k & \forall f \in \mathsf{Fun} \\ & ([v, d]\_1, [v, d]\_2) \sqsupset [e]^\sharp (([u, d]\_1, [u, d]\_3), d) & \forall e = (u, l, v) \in E \end{aligned}$$

Note that the transfer function does get the expression component as a parameter and does not contribute directly to the interval component. Also, the current calling context is passed on to the function so that it is able to reference variables for this context in the expression component. The calling context may not be used for any other purpose. In addition, we have constraints for any *d* P *D* and for function call edges *e* " p*u, x*:"*f*p*. . .*q*, v*q P *E*:

$$\begin{aligned} ([v,d]\_1, [v,d]\_2) &\equiv \mathtt{let}\ h, k = \mathtt{enter}\_e^\sharp([u,d]\_1, [u,d]\_3, d) \text{ in} \\ \mathtt{let}\ d' &= (h, \lambda x. \llbracket k(x)\rrbracket^\sharp\_\mathcal{E}(\lambda \,(u,d',y). [u,d']\_3(y))) \text{ in} \\ \mathtt{comb}^\sharp\_e([u,d]\_1, [u,d]\_3, [f\_{end}, d']\_1, [f\_{end}, d']\_3, d, d') \end{aligned}$$

Neither enter<sup>7</sup> *<sup>e</sup>* nor comb<sup>7</sup> *<sup>e</sup>* depend directly on the expression component and do not contribute directly to the interval component. In addition to the caller calling context, the new context is passed on to comb<sup>7</sup> *e* . We assume that the contexts are used only in the expression component to reference variables in the respective contexts. Thus, if the generation of expressions also does not depend on interval values, they will be computed alongside the helper analysis and do not add iteration steps.

The correctness condition of interval analysis with interval expression can be stated w.r.t. plain interval analysis: the produced expressions must evaluate to intervals that describe all possible states from the collecting semantics. I.e., for any state *s* P r*v, c*s in collecting semantics, we must ensure that it is described by the analysis *s* ∆ pr*v, d*s1*,*r*v, d*s3q where context are related *c* ∆ *d*.

The above is ensured by the framework if the transfer functions are translated into corresponding symbolic expressions. Given an edge *e*, and the original sound abstract function v*e*w 7 *I* , we now need corresponding symbolic representations, *e* 7 . Ignoring detail, in order to ensure result consistency, it is sufficient for our symbolic transfer function to satisfy the condition that v*e* 7 w 7 *<sup>E</sup>* must compute the same result as v*e*w 7 *I* . Similar conditions can be given for inter-procedural analysis functions enter<sup>7</sup> and comb<sup>7</sup> . The detailed sufficient conditions for result consistency are stated in the following lemmas.

**Lemma 1 (Intra-Procedural Result Consistency).** *Given for all d* P *D and* p*u, e, v*q P *E where e* <sup>7</sup> " v*e*w 7 ppr*u, d*s1*,* r*u, d*s3q*, d*q<sup>2</sup> *only contains variables preceding program point u and context d such that its evaluation* v*e* 7 w 7 *E* p*λ\_.*r*u, d*s3q *is equal to the original interval analysis* v*e*w 7 *I* pr*u, d*s1*,* r*u, d*s3q*—then the results of the original intra-procedural analysis and transformed analysis are consistent.*


Fig. 5: Example analysis of using separate components of the domain *D*<sup>1</sup> .

**Lemma 2 (Inter-Procedural Result Consistency).** *In addition to the assumptions of Lemma 1, we require that the generated function entry state s* <sup>7</sup> " enter<sup>7</sup> *e* pr*u, d*s1*,* r*u, d*s3*, d*q<sup>2</sup> *evaluates to the same value as in the original analysis, i.e.,* v*s* 7 <sup>w</sup>*<sup>E</sup>* <sup>p</sup>*λ\_.*r*u, d*s3q " enter<sup>7</sup> *e,I* pr*u, d*s1*,* r*u, d*s3q*. Finally, the generated function return state r* <sup>7</sup> " comb<sup>7</sup> *e* pr*u, d*s1*,* r*u, d*s3*,* r*fend, d*<sup>1</sup> s1*,* r*fend, d*<sup>1</sup> s3*, d, d*<sup>1</sup> q<sup>2</sup> *must evaluate to the same value as in the original analysis, i.e.,* v*r* 7 w*<sup>E</sup>* p*λ\_.*r*u, d*s3q " comb<sup>7</sup> *e,I* pr*u, d*s1*,*r*u, d*s3*,*r*fend, d*<sup>1</sup> s1*,*r*fend, d*<sup>1</sup> s3q*. Then the results of the inter-procedural original analysis and transformed analysis are consistent.*

A demand-driven constraint system solver would alternate between generating and evaluating expressions, yielding online meta-abstract interpretation [18]. Offline meta-abstract interpretation could be achieved when the generation of expressions does not depend on or even query the results of the expressions' evaluations. A demand-driven constraint system solver could first generate all expressions and then, independently, evaluate them.

*Example 3.* The analysis of the running example using our most recent constraint system that separates the helper, expression, and interval components of *D*1 is shown in Fig. 5. The analysis in this example produces expressions based on the helper analysis — interval values are not queried. Thus, we can first compute all object-escape information and interval expressions, and only then interval values. We have decreased the amount of unnecessary re-computation (in non-bold font), but clarity for the analysis user is still lacking. We note that we can eliminate unnecessary re-computation altogether by distributing interval computations of different program variables to separate constraint system unknowns [5].

$$\begin{aligned} \left(\operatorname{I\!\!\!p}(f)\right)^{\sharp}\_{\mathcal{E}}(\rho) & \subseteq k\langle\Delta,k\langle\nabla,\left[f(\bot)\right]^{\sharp}\_{\mathcal{E}}(\rho)\rangle\rangle \quad \text{where} \\ k\langle\square,x\rangle & \coloneqq \begin{cases} k\langle\square,x\sqcap\left[f(x)\right]^{\sharp}\_{\mathcal{E}}(\rho)\rangle, & \text{if } x \not\models\left[f(x)\right]^{\sharp}\_{\mathcal{E}}(\rho) \\ x, & \text{otherwise} \end{cases} \end{aligned}$$

Fig. 6: Over-approximating lfp using widening and narrowing.

#### **5 Obtaining closed-form expressions**

We saw from the previous example that the generated constraint system is not very clear. Thus, as a post-processing step, we may want to produce closed expressions *E* <sup>1</sup> Ą E 7 that compute the respecting values. For that, we need to encode the least upper bounds as uninterpreted function calls (join P F) and add the least fixpoint operator that takes a lambda expression as an argument.

$$\mathcal{E}' ::= \left[ N, D', \text{Var} \right] \mid \text{F}(\mathcal{E}'^\*) \mid \top \mid \text{lfp}(\lambda \ x. \mathcal{E}')$$

The extended expressions *E* <sup>1</sup> need not form a complete lattice as it is only used as output. Also, we need to note that the expressions we generate specifies the least (partial) solution of a constraint system, i.e., the smallest element of the lattice that over-approximates the concrete collecting semantics. And for that reason we make use of the least fixpoint expressions lfpp*f*q, the meaning of which can be described as vlfpp*f*qw<sup>7</sup> *E* p*ρ*q " Ů *n*PN v*f <sup>n</sup>*pKqw<sup>7</sup> *E* p*ρ*q. The least overapproximations are not computable in general, e.g., generic constraint system solvers also do not aim to compute the least fixpoint but some nontrivial fixpoint. For a domain with infinite ascending chains, a fixpoint can be computed using an ascending iteration using widening followed by a descending iteration using narrowing [15], as shown in Fig. 6. Though, for Noetherian domains, it suffices to have a single precise ascending iteration.

To get closed expressions, we need to inline constraints in such a way that recursion is captured using the fixpoint operator lfp. For that we define a substitution function substp*e, ρ*q where *e* P *E* 1 is an expression, *ρ* P p*N* ˆ *D*<sup>1</sup> ˆ Varq ãÑ *E* 1 is a partial map from variables to expressions that are to be substituted.

$$\begin{aligned} \mathtt{subst}(\top,\rho) &= \top\\ \mathtt{subst}(g(e\_1,\ldots,e\_n),\rho) &= g(\mathtt{subst}(e\_1,\rho),\ldots,\mathtt{subst}(e\_n,\rho))\\ \mathtt{subst}(\mathtt{lip}(\lambda\ x.e),\rho) &= \mathtt{lip}(\lambda\ x.\mathtt{subst}(e,\rho-\{x\}))\\ \mathtt{subst}([x],\rho) &= \begin{cases} [x] & \text{if } x \notin \text{dom}(\rho) \\ \mathtt{let}\ e = \mathtt{subst}(\rho(x),\rho-\{x\}) \text{ in} \\ \mathtt{if}\ x \in \text{FV}(e) \ \mathtt{then}\ \mathtt{Ifp}(\lambda\ x.e) \text{ else } e \end{cases} & \text{otherwise} \end{aligned}$$

No further substitution is required in case the expression is J. For function application, we recursively perform substitution in the arguments. For fixpoint expressions, we use recursion while decreasing the partial map *ρ* by the formal parameter *x*. For variables r*x*s, we first determine whether substitution is needed.

$$\begin{aligned} x^2 &= 0 \sqcup \{ (x^2 \sqcap [-\infty, 99]) + 1 \} \\ x^6 &= x^2 \sqcap [100, \infty] \end{aligned} \tag{3}$$

$$x^6 = \text{lfp}(\lambda \, z. \, 0 \, \sqcup \, (z \sqcap [-\infty, 99]) + 1) \sqcap [100, \infty] \tag{4}$$

$$y^6 = \text{lfp}(\lambda z.0 \sqcup z) = 0\tag{5}$$

$$\left[ \left[ x^{6} \right]\_{\mathcal{E}}^{\sharp} (\rho) = \left( \left( 0 \nabla [0, 1] \right) \Delta [0, 100] \right) \sqcap \left[ 100, \infty \right] = \left[ 100, 100 \right] \tag{6}$$

Fig. 7: Simplified interval expressions and evaluation for our running example.

We perform no substitution if *x* is not a key in *ρ*. If *x* maps to *e* 1 in *ρ*, we first perform the substitution in *e* 1 to obtain a closed form; however, we remove *x* from *ρ* to ensure termination. Next, if *x* is still free in the result of the recursive substitution *e* 2 , we return lfpp*λ x. e*<sup>2</sup> q. If, however, *x* is not free in *e* <sup>2</sup> we can, as an optimization, directly return *e* 2 .

The justification for using substitution for expressions is given in Lemma 3 that states that substitution retains the least solutions. Using widening and narrowing in conjunction with substitution is not predictable as widening and narrowing are not necessarily monotonic. However, any solution is still a sound over-approximation of the least solution.

**Lemma 3.** *Given a constraint system ρ with the least solution σ such that a constraint x* Ě *f<sup>x</sup> in ρ implies that x maps to* v*fx*w 7 *E* p*σ*q *in σ. Then for any subset of constraints ρ* <sup>1</sup> Ď *ρ and expression e we have*

$$\|\mathsf{subst}(e,\rho')\|\_{\mathcal{E}}^\sharp(\sigma-\mathsf{dom}(\rho'))=\lceil e\rceil\_{\mathcal{E}}^\sharp(\sigma)$$

*Proof.* Using structural induction, the J case is trivial. Function application and fixpoint iteration cases are applications of the induction hypothesis if we can conclude that substituted variables will not be free after substitution. Similarly, the first case of r*x*s is trivial. For the second case in r*x*s, we see that the evaluation of *e* using *σ* will be equal to the value *σ*p*x*q and if *x* is not free in *e* then we have shown our goal. If, however, *x* is free in *e* then we can conclude that lfpp*λ x. e*q will be equal to *σ*p*x*q as *σ* is the least solution. [\

*Example 4.* For our running example in Fig. 3a we have computed interval expressions for each program point in Fig. 5. Results shown in Fig. 7: inlining can produce (based on preference) recursive definitions for *x* <sup>2</sup> and *x* 6 (3) or a single non-recursive definition that uses the lfp operator (4). We see that the expression for *y* 6 can be simplified to constant zero (5). The evaluation of *x* <sup>6</sup> yields the expected result of exactly one hundred (6).

The function subst can be used to inline all constraints at once to generate a closed expression or, using some custom strategy, to generate a more compact constraint system. For using generated expressions to explain the resulting interval values in a user-friendly way, we may want to inline all variables except function or method calls. As a corollary of Lemma 3, we then have explanation consistency for the least solutions of the closed forms and the interval analysis.

```
12 void foo() {
13 x = 0;
14 y = 0;
15 for (x<100)
16 x += 1;
17 P6der.evalInt(x);
18 P6der.evalInt(y);
19 }
                             İ value 100 due to condition "at least 100" on line 15
                              İ range r0, 100s due to a loop on line 15 on field x
                                 starting with: value 0 due to constant on line 13
                                İ cycle with:
                                 İ range r1, 100s due to operation IADD
                                  İ parameter 1:
                                    İ range r0, 99s due to condition "at most 99"
                                       range r0, 100s due to field x at line 15
                                    parameter 2: value 1 due to constant on line 16
```
Fig. 8: Reproduction of the explanation provided by Põder from Fig. 2.

**Theorem 1 (Consistency Closed-Form Explanations).** *For any program point v* P *N and any context d* P *D, the inlined version of expression* r*v, d*s<sup>2</sup> *describes the least possible interval value of* r*v, d*s3*, even if the computed interval is an over-approximation.*

### **6 Usability and Experimental implementation**

Displaying analysis expressions to the user is a challenge. First, the size of the expressions might be overwhelming for non-trivial programs. Instead of asking the user to grasp the whole expression at once, we should formulate the expression in a way that can be followed step by step. Second, the syntax should be intuitively understandable but sufficiently precise. The terminology should be programmer centric such that it avoids unnecessarily theoretic and program analysis specific terms. This could be fine-tuned based on user studies. Third, the sub-expressions should relate to the analyzed source code in a clear fashion so that knowledge from the expression can be used to better understand the code.

To investigate these usability issues further, we prototyped the proposed analysis method in a new Java bytecode analyzer called Põder. The tool has a source-code view and a control-flow-graph view which may be inspected while stepping through the analysis. Analysis results are presented in a collapsible tree-view, which may be examined by selecting a line in the source-code or selecting a node in the CFG.

Fig. 2, at the start of the paper, is a screenshot of Põder after it has completed the analysis of the code from Fig. 3. For easier readability, we have reproduced this explanation in Fig. 8. When the line 17 from OInt.java is selected by the user, the value of the field x is shown as 100 on the right.

The full explanation of the value in field y, which was not updated in the loop, is "value 0 due to constant on line 14". We note that the loop has been optimized away as shown in Fig. 7. The full explanation of the value in field x is also shown in Fig. 8, even though initially the explanation of the value is partially collapsed at the level of "range [1,100] due to operation IADD". Leaving the loop is only possible with a value no less than 100. Before that, a loop is entered with a constant value of 0 and each cycle performs an addition operation. The second addend is the constant one while the first addend is the value

	- İ cycle with:
		- İ range r1*,* `infs due to operation IADD parameter 1: range r0*,* `infs due to field x at the head of the loop in the class OIntBad. parameter 2: value 1 due to constant.

Fig. 9: Explanation for field x if this may escape.

of x at the head of the loop, satisfying the condition for entering the body, i.e., it is less than 100. Selecting a line from the explanation highlights its source in the source-code view. In the screenshot the line "range r1*,* 100s due to operation IADD" is selected by the user, after which the source of the operation, at line 16, is highlighted in the source-code view.

Now let us consider other programs than our running example. Suppose we need to explain a value returned from a method call a = add(5,20). In that case, the explanation will list explanations of parameters, the returned range, and the line of the called method in the source code where the user can find the explanation of that method's return. Next, we look at the case where foo is called on an object that may be visible to other threads — this happens, e.g., when its reference is written to a static field. In such a general case, the analysis handles the object fields context- and flow-insensitively [30]. Thus, guard constraints will not have an effect and the result for field x is r0*,* `8s. Furthermore, because of flow-insensitivity, the loop materializes in the fields' value and not directly because of the loop in method foo. The generated explanation is given in Fig. 9.

### **7 Generalizability**

The problem of explaining the absence of warnings is challenging, especially from a usability perspective. Our experimental implementation shows that generating explanations for simple inductive invariants is possible. We will now address questions of generalizability to larger programs and state-of-the-art analyses where one may require more fine-grained explanations of the computation and the employed domains are more complex.

*Large Programs.* While we have focused here on simpler programs where the explanation is brief and all related program points are close to each other. The implementation can handle inter-procedural explanations, and we include an inter-procedural example in the replication package. Explanations may thus span multiple different files, and the explanation tab allows convenient navigation between these files. The limitation of experimental implementation is that it lacks state-of-the-art abstractions for the analysis of real-world Java programs. As symbolic domains are employed by many real-world analyzers [11, 20, 25], the runtime overhead is not a significant bottleneck. Our method generalizes well to larger programs; the main difficulty is explaining more complex analyses.

*Complicated Computations.* State-of-the-art analyzers handle a wide array of programming languages features, such as dynamic memory allocation and thread

Fig. 10: Bubble sort array bound example.

creation. The analysis is, therefore, built from a combination of domains [17]. Thus, a new abstract value is computed at each step, first, based on the computation for each individual domain, and then, refinement operations are applied (e.g., reduced products) and integrated with the previous state using more complex widening/narrowing operations that may interact with thresholds and counters.

While the general framework supports any granularity, obtaining more finegrained explanations requires extending the explanation vocabulary with symbolic operations for low-level operators such as threshold widenings and reduced products. There are serious implementation and usability challenges to obtain readable explanations for complex computations. As we focused on how to explain simpler invariants to end users, we leave the issue of explaining more complex computations as an open problem in explainable static analysis.

*Relational Domains.* Filtering out relevant information happens naturally for pointwise domains such as the box domain. For each program variable, we collect only expressions that affect that variable. So, for justifying the value of a variable, we just need the expression for that variable. No such natural slicing occurs for relational domains where one program variable may depend on other program variables. Thus, the explanation of a relational value must filter out unnecessary information as a post-processing step taking into account the computed solution.

We have not yet worked out a general algorithm that generates arbitrary explanations for relational domains. However, as an example, consider a polyhedral analysis of the Bubble sort algorithm in Figure 10b, where we picked one specific bound condition *j* ` 1 ă len on line 3 to check. The hand-computed explanation is in Figure 10a, where the queried condition is explained using relevant parts of the invariant *i* ě 0 ^ *j* ă len ´ 1 ´ *i* for line 3. The explanation of a part of an invariant may refer to other invariants, basic facts, or statements from the program, as seen on Figure 10a. We note that validity of the explanation is not trivial to see, but it nevertheless captures exactly how the analyzer inferred that the access is not outside the array bounds.

#### **8 Related work**

There has been work in recent years to address usability issues and improve the understandability of static analysis results [26, 28]. These mostly focus on explaining analysis warnings. Zhang et al. [35] present an interactive approach to eliminating false alarms of a sound race detection analysis by applying more aggressive and potentially unsound heuristics. Facts about the program inferred by the analyzer are translated into human-readable queries that the user can confirm or reject; however, the aim is not to explain *how* the results were computed, which is the emphasis of our work.

Combining abstract interpretation [15] and partial evaluation [24] has been considered to the effect of improving partial evaluation [14, 23]. As an alternative to generating constraints explicitly, partial evaluation can also be used in the context of constraint-based program analyses [3]. Though, partial evaluation does not allow direct inspection of the intermediate result and has, at times, unpredictable runtime behavior. In the context of partial evaluation of logic program analysis, improved precision and performance has been achieved [29] though not with the goal of producing more explainable analyses. Recently, partial evaluation of Horn clauses has been used for control-flow refinement [19] to increase precision and make implicit control-flow explicit.

Cousot and Cousot [16] has described how sound program transformation can be formalized within abstract interpretation in a general uniform languageindependent framework. The correctness of transformation is an orthogonal issue w.r.t. the goals of this paper. Most other applications of meta-analysis focus on reasoning and quantifying precision loss [10, 21], which is again orthogonal to the explainability of the fixpoint computation.

Another related approach has been the drive to generate proof objects, *witnesses*, as evidence for the verdict of the analyzer. For error verification, counterexample witnesses [8] may be generated based on the inspection of expression information to minimize the set of paths required to reach an error state. For correctness, analyzers can output their computed invariants, which can be validated by other analyzers [1, 7, 27]. Being able to produce some artifact as evidence for successful verification is also our goal, but we aim here for explanation that humans find understandable and convenient to navigate.

Finally, we note that symbolic domains [11, 20, 25] are also used to express properties about the program. Thus, these analyses use symbolic expressions over program variables that soundly over-approximate the program state. In contrast, we use expressions involving constraint system variables in order to reason about the constraint system itself and extract an explanation for the computation of the abstract values of program variables at a given program point.

#### **9 Conclusion**

The ability to produce counter-examples has been an important reason behind the tremendous success of software model checking. For developers to also see the value in *sound* analysis, more work is needed on explainability, so that a verdict that the program is safe can be trusted by our end users. We have taken a significant step in this direction, and characterized the challenges that lie ahead.

Using interval analysis as an example, we have presented a general scheme to write analyses that generate parts of the constraint system as an intermediate step. The generated constraint system can be transformed into a closed expression and simplified, e.g., to inline computations and even remove unnecessary loops. The closed expressions can be mapped onto user-friendly explanations of how the analysis results are computed, which we have integrated into a prototype tool for explainable program analysis.

**Acknowledgments.** We thank the reviewers for their thoughtful and constructive feedback. This work was supported by the Estonian Research Council grant PSG61 and the Estonian Centre of Excellence in IT (EXCITE), funded by the European Regional Development Fund.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Explainable Online Monitoring of Metric Temporal Logic

Leonardo Lima1()† , Andrei Herasimau<sup>2</sup> , Martin Raszyk3† ,

> Dmitriy Traytel1()† , and Simon Yuan<sup>2</sup>

<sup>1</sup> Department of Computer Science, University of Copenhagen, Copenhagen, Denmark {leonardo,traytel}@di.ku.dk <sup>2</sup> Department of Computer Science, ETH Zurich, Zurich, Switzerland ¨

<sup>3</sup> DFINITY Foundation, Zurich, Switzerland

Abstract. Runtime monitors analyze system execution traces for policy compliance. Monitors for propositional specifcation languages, such as metric temporal logic (MTL), produce Boolean verdicts denoting whether the policy is satisfed or violated at a given point in the trace. Given a suffciently complex policy, it can be diffcult for the monitor's user to understand how the monitor arrived at its verdict. We develop an MTL monitor that outputs verdicts capturing why the policy was satisfed or violated. Our verdicts are proof trees in a sound and complete proof system that we design. We demonstrate that such verdicts can serve as explanations for end users by augmenting our monitor with a graphical interface for the interactive exploration of proof trees. As a second application, our verdicts serve as certifcates in a formally verifed checker we develop using the Isabelle proof assistant.

Keywords: metric temporal logic · runtime monitoring · explanations · proof system · formal verifcation · certifcation

### 1 Introduction

In runtime verifcation, monitoring is the task of analyzing an event stream produced by a running system for violations of specifed policies. An online monitor for a propositional policy specifcation language, such as metric temporal logic (MTL), consumes the stream event-wise and gradually produces a stream of Boolean verdicts denoting the policy's satisfaction or violation at every point in the event stream. MTL monitors [3, 19, 24, 27, 33] use complex algorithms, whose correctness is not obvious, to effciently arrive at the verdicts. Yet, users must rely on the algorithms being correct and correctly implemented, as the computed verdicts carry no information as to why the policy is satisfed or violated.

The two main approaches to increase the reliability of complex algorithm implementations are verifcation and certifcation. Formal verifcation using proof assistants or software verifers is laborious and while it provides an ultimate level of trust, the user of a verifed tool still gains no insight into why a specifc, surely correct verdict was produced. In contrast, certifcation can yield both trust (especially when the certifcate checker is itself formally verifed) and insight, provided that the certifcate is not only machine-checkable but also human-understandable.

© The Author(s) 2023

<sup>†</sup> Lima and Traytel are supported by a Novo Nordisk Fonden start package grant (NNF20OC0063462). Raszyk's work was carried out during his past employment at ETH Zurich supported by the Swiss National Science Foundation grant Big Data Monitoring ¨ (167162). All authors thank David Basin for supporting this work.

https://doi.org/10.1007/978-3-031-30820-8 28 S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 473–491, 2023.

In this paper, we develop a certifcation approach to MTL monitoring: instead of Boolean verdicts, we require the monitor to produce checkable and understandable certifcates. To this end, we develop a sound and complete local proof system (§2) for the satisfaction and violation of MTL policies. Following Cini and Francalanza [15], local means that a proof denotes the policy satisfaction on a given stream of events and not general MTL satisfability (for any stream). Our proof system is an adaptation of Basin et al.'s [4] local proof system for LTL satisfability on lasso words to MTL with past and bounded future temporal operators. A core design choice for our proof system was to remain close to the MTL semantics and thus to be understandable for users who reason about policies in terms of the semantics. Therefore, proof trees in our proof system, or rather their compact representation as proof objects (§3), serve as understandable certifcates.

With the certifcate format in place, we devise an algorithm that computes minimal (in terms of size) proof objects (§4). We implement the algorithm in OCaml and augment it with an interactive web application<sup>1</sup> to visualize and explore the computed proof objects (§5). Independently, we prove the soundness and completeness of our proof system and formally verify a proof checker using the Isabelle/HOL proof assistant. We extract OCaml code from this formalization and use it to check the correctness of the verdicts produced by our unverifed algorithm. To ensure that our correct verdicts are also minimal, we develop a second formally verifed but less effcient monitoring algorithm in Isabelle, which we use to compute the minimal proof object size when testing our unverifed algorithm.

Finally, we demonstrate how our work provides explainable monitoring output through several examples (§6) and empirically evaluate our algorithm's performance in comparison to other monitors (§7). In summary, we make the following contributions:


*Related Work.* We take the work by Basin et al. [4] on optimal proofs for LTL on lasso words as our starting point but change the setting from lasso words to streams of time-stamped events and the logic from LTL to MTL. Moreover, Basin et al. considered the offine path checking problem, whereas we tackle online monitoring here.

Parts of the work presented here are also described in two B.Sc. theses by Yuan [39] and Herasimau [16]. Yuan developed the MTL proof system we present here as well as a monitoring algorithm for computing optimal proofs based on dynamic programming (similarly to Basin et al.'s algorithm [4]). Herasimau formalized Yuan's development in Isabelle/HOL. We use his work as the basis for our formally verifed checker. Here, we present a different algorithm that resembles the algorithms used by state-of-the-art monitors for metric frst-order temporal logic [5, 29], which perform much better than dynamic programming algorithms for non-trivial metric interval bounds.

Basin et al.'s approach [4] is parameterized by a comparison relation on proof objects that specifes what the algorithm should optimize for. Yuan [39] discovers a faw in the correctness claim for Basin et al.'s algorithm and corrects it by further restricting the

<sup>1</sup> https://runtime-monitoring.github.io/explanator2

*<sup>i</sup>* <sup>⊨</sup> *<sup>p</sup>* iff *<sup>p</sup>* <sup>∈</sup> <sup>π</sup>*<sup>i</sup> <sup>i</sup>* <sup>⊨</sup> <sup>α</sup>∨<sup>β</sup> iff *<sup>i</sup>* <sup>⊨</sup> <sup>α</sup> or *<sup>i</sup>* <sup>⊨</sup> <sup>β</sup> *<sup>i</sup>* <sup>⊨</sup> *<sup>I</sup>*<sup>α</sup> iff *<sup>i</sup>* <sup>&</sup>gt; 0 and <sup>τ</sup>*<sup>i</sup>* <sup>−</sup>τ*i*−<sup>1</sup> <sup>∈</sup> *<sup>I</sup>* and *<sup>i</sup>*−<sup>1</sup> <sup>⊨</sup> <sup>α</sup> *<sup>i</sup>* <sup>⊨</sup> <sup>¬</sup><sup>α</sup> iff *<sup>i</sup>* <sup>⊭</sup> <sup>α</sup> *<sup>i</sup>* <sup>⊨</sup> <sup>α</sup>∧<sup>β</sup> iff *<sup>i</sup>* <sup>⊨</sup> <sup>α</sup> and *<sup>i</sup>* <sup>⊨</sup> <sup>β</sup> *<sup>i</sup>* <sup>⊨</sup> #*I*<sup>α</sup> iff <sup>τ</sup>*i*+<sup>1</sup> <sup>−</sup>τ*<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>* and *<sup>i</sup>*+<sup>1</sup> <sup>⊨</sup> <sup>α</sup> *<sup>i</sup>* <sup>⊨</sup> <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup> iff *<sup>j</sup>* <sup>⊨</sup> <sup>β</sup> for some *<sup>j</sup>* <sup>≤</sup> *<sup>i</sup>* with <sup>τ</sup>*<sup>i</sup>* <sup>−</sup>τ*<sup>j</sup>* <sup>∈</sup> *<sup>I</sup>* and *<sup>k</sup>* <sup>⊨</sup> <sup>α</sup> for all *<sup>j</sup>* <sup>&</sup>lt; *<sup>k</sup>* <sup>≤</sup> *<sup>i</sup> <sup>i</sup>* <sup>⊨</sup> <sup>α</sup> <sup>U</sup>*<sup>I</sup>* <sup>β</sup> iff *<sup>j</sup>* <sup>⊨</sup> <sup>β</sup> for some *<sup>j</sup>* <sup>≥</sup> *<sup>i</sup>* with <sup>τ</sup>*<sup>j</sup>* <sup>−</sup>τ*<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>* and *<sup>k</sup>* <sup>⊨</sup> <sup>α</sup> for all *<sup>i</sup>* <sup>≤</sup> *<sup>k</sup>* <sup>&</sup>lt; *<sup>j</sup>*

Fig. 1: Semantics of MTL for a fxed trace <sup>ρ</sup> <sup>=</sup> ⟨(π*<sup>i</sup>* ,τ*i*)⟩*i*∈<sup>N</sup>

supported comparisons. Herasimau [16] relaxes Yuan's requirements while formally verifying the correctness statement. Our algorithm minimizes the computed proof objects' size as this both simplifes the presentation and caters for a more effcient algorithm.

Formal verifcation of monitors is a timely topic. Some verifed monitors were developed recently using proof assistants, e.g., VeriMon [29] and Vydra [28] in Isabelle and lattice-mtl [8] in Coq. Others leveraged SMT technology to increase their trustworthiness [12, 14]. To the best of our knowledge, we present the frst verifed checker for an online monitor's output, even though verifed certifers are standard practice in other areas such as distributed systems [35], model checking [37,38], and SAT solving [11,21].

Several monitors visualize their output [1,2,7,18,25,30]; some of these even present visually separate verdicts for different parts of the policy. Our work takes inspiration from these approaches, but goes deeper: our minimal proof trees characterize precisely how the verdicts for the different parts compose to a verdict for the overall policy.

Our work follows the "proof trees as explanations" paradigm and thereby joins a series of works on LTL [4,15,32], CFTL [13], and CTL [9]. Of these only Basin et al. [4] supports past operators and none support metric intervals. Two of the above works [9,15] use proof systems based on the unrolling equations for temporal operators instead of the operator's semantics, which we believe is suboptimal for understandability: users think about the operators in terms of their semantics and not in terms of unrolling equations.

Outside of the realm of temporal logics one can fnd the "proof trees as explanations" paradigm in regular expression matching [31] and in the database community [10]. *Metric Temporal Logic.* We briefy recall MTL's syntax and point-based semantics [6]. MTL formulas are built from atomic propositions (*a*, *<sup>b</sup>*, *<sup>c</sup>*, ...) via Boolean (∧, <sup>∨</sup>, <sup>¬</sup>) and metric temporal operators (previous *<sup>I</sup>* , next #*<sup>I</sup>* , since S*<sup>I</sup>* , until <sup>U</sup>*I*), where *<sup>I</sup>* = [*l*,*r*] is a non-empty interval of natural numbers with *l* ∈ N and *r* ∈ N ∪ {∞}. We omit the interval when *<sup>l</sup>* <sup>=</sup> <sup>0</sup> and *<sup>r</sup>* <sup>=</sup> <sup>∞</sup>. For the until operator <sup>U</sup>[*l*,*r*] , we require the interval to be bounded, i.e., *r* ̸= ∞. Formulas are interpreted over streams of time-stamped events <sup>ρ</sup> <sup>=</sup> ⟨(π*<sup>i</sup>* ,τ*i*)⟩ *i*∈N , also called traces. An event <sup>π</sup>*<sup>i</sup>* is a set of atomic propositions that hold at the respective time-point *<sup>i</sup>*. Time-stamps <sup>τ</sup>*<sup>i</sup>* are natural numbers that are required to be monotone (i.e., *<sup>i</sup>* <sup>≤</sup> *<sup>j</sup>* implies <sup>τ</sup>*<sup>i</sup>* <sup>≤</sup> <sup>τ</sup>*j*) and progressing (i.e., for all <sup>τ</sup> there exists a timepoint *<sup>i</sup>* with <sup>τ</sup>*<sup>i</sup>* > τ). Note that consecutive time-points can have the same time-stamp. Fig-

ure <sup>1</sup> shows MTL's standard semantics for a formula φ at time-point *<sup>i</sup>* for a fxed trace ρ. Fix a trace <sup>ρ</sup> <sup>=</sup> ⟨(π*<sup>i</sup>* ,τ*i*)⟩*i*∈<sup>N</sup> . The *earliest time-point* of a time-stamp τ on ρ is the smallest time-point *<sup>i</sup>* such that <sup>τ</sup>*<sup>i</sup>* <sup>≥</sup> <sup>τ</sup> and is denoted as ETPρ(τ). Similarly, the *latest timepoint* of a time-stamp <sup>τ</sup> <sup>≥</sup> <sup>τ</sup><sup>0</sup> on <sup>ρ</sup> is the greatest time-point *<sup>i</sup>* such that <sup>τ</sup>*<sup>i</sup>* <sup>≤</sup> <sup>τ</sup> and is denoted as LTPρ(τ). Whenever the trace <sup>ρ</sup> is fxed, we will only write ETP(τ) and LTP(τ).

#### 2 Local Proof System

We introduce a local proof system for monitoring MTL formulas as the least relation satisfying the rules shown in Figure 2. It contains two mutually dependent judgments: ⊢ +

*<sup>a</sup>* <sup>∈</sup> <sup>π</sup>*<sup>i</sup> i* ⊢ <sup>+</sup> *a ap*<sup>+</sup> *<sup>i</sup>* <sup>⊢</sup> − α *i* ⊢ <sup>+</sup> <sup>¬</sup>α ¬ + *i* ⊢ + α *i* ⊢ + α∨β ∨ + *L i* ⊢ + β *i* ⊢ + α∨β ∨ + *R i* ⊢ + α *<sup>i</sup>* <sup>⊢</sup> + β *i* ⊢ + α∧β ∧ + *<sup>a</sup>* <sup>∈</sup>/ <sup>π</sup>*<sup>i</sup> i* ⊢ <sup>−</sup> *a ap*<sup>−</sup> *<sup>i</sup>* <sup>⊢</sup> + α *i* ⊢ <sup>−</sup> <sup>¬</sup>α ¬ − *i* ⊢ − α *i* ⊢ − α∧β ∧ − *L i* ⊢ − β *i* ⊢ − α∧β ∧ − *R i* ⊢ − α *<sup>i</sup>* <sup>⊢</sup> − β *i* ⊢ − α∨β ∨ − *<sup>j</sup>* <sup>≤</sup> *<sup>i</sup>* <sup>τ</sup>*<sup>i</sup>* <sup>−</sup>τ*<sup>j</sup>* <sup>∈</sup> *I j* <sup>⊢</sup> + β <sup>∀</sup>*<sup>k</sup>* <sup>∈</sup> (*j*, *<sup>i</sup>*]. *<sup>k</sup>* <sup>⊢</sup> + α *i* ⊢ + <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup> S + *<sup>i</sup>* <sup>&</sup>gt; <sup>0</sup> <sup>τ</sup>*<sup>i</sup>* <sup>−</sup>τ*i*−<sup>1</sup> <sup>∈</sup> *I i*−<sup>1</sup> <sup>⊢</sup> + α *i* ⊢ <sup>+</sup> *<sup>I</sup>*α + E p *i* ([*l*,*r*]) <sup>≤</sup> *j j* <sup>≤</sup> *i m* <sup>=</sup> <sup>L</sup> p *i* ([*l*,*r*]) <sup>τ</sup>*<sup>i</sup>* <sup>−</sup>τ<sup>0</sup> <sup>≥</sup> *l j* <sup>⊢</sup> − α <sup>∀</sup>*<sup>k</sup>* <sup>∈</sup> [ *<sup>j</sup>*, *<sup>m</sup>*]. *<sup>k</sup>* <sup>⊢</sup> − β *i* ⊢ − <sup>α</sup> <sup>S</sup>[*l*,*r*] <sup>β</sup> S − *j* = E p *i* ([*l*,*r*]) *<sup>m</sup>* <sup>=</sup> <sup>L</sup> p *i* ([*l*,*r*]) <sup>τ</sup>*<sup>i</sup>* <sup>−</sup>τ<sup>0</sup> <sup>≥</sup> *<sup>l</sup>* <sup>∀</sup>*<sup>k</sup>* <sup>∈</sup> [ *<sup>j</sup>*, *<sup>m</sup>*]. *<sup>k</sup>* <sup>⊢</sup> − β *i* ⊢ − <sup>α</sup> <sup>S</sup>[*l*,*r*] <sup>β</sup> S − ∞ <sup>τ</sup>*<sup>i</sup>* <sup>−</sup>τ<sup>0</sup> <sup>&</sup>lt; *<sup>l</sup> i* ⊢ − <sup>α</sup> <sup>S</sup>[*l*,*r*] <sup>β</sup> S − <*I* 0 ⊢ <sup>−</sup> *<sup>I</sup>*α − *0 <sup>i</sup>* > <sup>0</sup> *<sup>i</sup>*−<sup>1</sup> <sup>⊢</sup> − α *i* ⊢ <sup>−</sup> *<sup>I</sup>*α − *<sup>i</sup>* <sup>&</sup>gt; <sup>0</sup> <sup>τ</sup>*<sup>i</sup>* <sup>−</sup>τ*i*−<sup>1</sup> <sup>&</sup>lt; *<sup>I</sup> i* ⊢ <sup>−</sup> *<sup>I</sup>*α − <*I <sup>i</sup>* <sup>&</sup>gt; <sup>0</sup> <sup>τ</sup>*<sup>i</sup>* <sup>−</sup>τ*i*−<sup>1</sup> <sup>&</sup>gt; *<sup>I</sup> i* ⊢ <sup>−</sup> *<sup>I</sup>*α − >*I <sup>i</sup>* <sup>≤</sup> *<sup>j</sup>* <sup>τ</sup>*<sup>j</sup>* <sup>−</sup>τ*<sup>i</sup>* <sup>∈</sup> *I j* <sup>⊢</sup> + β <sup>∀</sup>*<sup>k</sup>* <sup>∈</sup> [*i*, *<sup>j</sup>*). *<sup>k</sup>* <sup>⊢</sup> + α *i* ⊢ + <sup>α</sup> <sup>U</sup>*<sup>I</sup>* <sup>β</sup> U + <sup>τ</sup>*i*+<sup>1</sup> <sup>−</sup>τ*<sup>i</sup>* <sup>∈</sup> *I i*+<sup>1</sup> <sup>⊢</sup> + α *i* ⊢ <sup>+</sup> #*I*α #+ *m* = E f *i* (*I*) *i* ≤ *j j* ≤ L f *i* (*I*) *j* ⊢ − α <sup>∀</sup>*<sup>k</sup>* <sup>∈</sup> [*m*, *<sup>j</sup>*]. *<sup>k</sup>* <sup>⊢</sup> − β *i* ⊢ − <sup>α</sup> <sup>U</sup>*<sup>I</sup>* <sup>β</sup> U − <sup>τ</sup>*i*+<sup>1</sup> <sup>−</sup>τ*<sup>i</sup>* <sup>&</sup>lt; *<sup>I</sup> i* ⊢ <sup>−</sup> #*I*α # − <*I m* = E f *i* (*I*) *j* = L f *i* (*I*) <sup>∀</sup>*<sup>k</sup>* <sup>∈</sup> [*m*, *<sup>j</sup>*]. *<sup>k</sup>* <sup>⊢</sup> − β *i* ⊢ − <sup>α</sup> <sup>U</sup>*<sup>I</sup>* <sup>β</sup> U − ∞ *i*+1 ⊢ − α *i* ⊢ <sup>−</sup> #*I*α #− <sup>τ</sup>*i*+<sup>1</sup> <sup>−</sup>τ*<sup>i</sup>* <sup>&</sup>gt; *<sup>I</sup> i* ⊢ <sup>−</sup> #*I*α # − >*I*

Fig. 2: Local proof system for MTL for a fxed trace <sup>ρ</sup> <sup>=</sup> ⟨(π*<sup>i</sup>* ,τ*i*)⟩*i*∈<sup>N</sup>

(for satisfaction proofs) and ⊢ <sup>−</sup> (for violation proofs). A satisfaction (violation) proof describes the satisfaction (violation) of a formula at a given time-point on a fxed trace ρ. Each rule is suffxed by <sup>+</sup> or <sup>−</sup>, indicating whether an operator has been satisfed or violated. Moreover, we defne E p *i* (*I*) :<sup>=</sup> ETP(τ*<sup>i</sup>* <sup>−</sup>*r*) and <sup>L</sup> p *i* (*I*) :<sup>=</sup> min(*i*,LTP(τ*<sup>i</sup>* <sup>−</sup>*l*)) for *<sup>I</sup>* = [*l*,*r*], which correspond to the earliest and latest time-point within the interval *I*, respectively, when formulas having S*<sup>I</sup>* as their topmost operator are considered. In the defnition of L p *i* (*I*) we take the minimum to account for consecutive time-stamps with the same value. For formulas having U*<sup>I</sup>* as their topmost operator, both defnitions are mirrored, resulting in E f *i* (*I*) :<sup>=</sup> max(*i*,ETP(τ*<sup>i</sup>* <sup>+</sup>*l*)) and <sup>L</sup> f *i* (*I*) :<sup>=</sup> LTP(τ*<sup>i</sup>* <sup>+</sup>*r*).

The semantics of the MTL operators directly corresponds to the satisfaction rules *ap*+, ¬ <sup>+</sup>, ∨ + *L* , ∨ + *R* , ∧ <sup>+</sup>, S <sup>+</sup>, U <sup>+</sup>, <sup>+</sup>, and #+. For instance, consider two time-points *<sup>j</sup>* and *<sup>i</sup>* such that *j* ≤ *i*. The rule S <sup>+</sup> is applied whenever the time-stamp difference <sup>τ</sup>*i*−τ*<sup>j</sup>* belongs to the interval *<sup>I</sup>*, and there is a witness for a satisfaction proof of β in the form of *<sup>j</sup>* <sup>⊢</sup> + β together with a fnite sequence of satisfaction proofs of α for all *<sup>k</sup>* <sup>∈</sup> (*j*,*i*]. The violation rules for the non-temporal operators *ap*−, ¬ <sup>−</sup>, ∨ <sup>−</sup>, ∧ − *L* , ∧ − *R* are dual to their satisfaction counterparts. On the other hand, the violation rules for the temporal operators *<sup>I</sup>* , #*<sup>I</sup>* , S*<sup>I</sup>* , and <sup>U</sup>*<sup>I</sup>* are derived by negating and rewriting their semantics. Consider <sup>S</sup>*<sup>I</sup>* with *<sup>I</sup>* = [*l*,*r*]:

$$\begin{array}{c} \text{if } \forall \alpha \, \mathcal{S}\_{I} \beta \iff \left(\tau\_{i} - \tau\_{0} \ge I \land \exists j \in \left(\mathsf{E}\_{l}^{\mathsf{P}}(I), i], j \nvdash \alpha \land \forall k \in \left[j, \mathsf{L}\_{l}^{\mathsf{P}}(I)\right]. k \nvdash \beta\right) \lor\\ \left(\tau\_{i} - \tau\_{0} \ge I \land \forall k \in \left[\mathsf{E}\_{l}^{\mathsf{P}}(I), \mathsf{L}\_{l}^{\mathsf{P}}(I)\right]. k \nvdash \beta\right) \lor \tau\_{i} - \tau\_{0} < l \end{array} \tag{1}$$

The rules S <sup>−</sup>, S − <sup>∞</sup>, and S − <*I* correspond to the three disjuncts in Equation (1). We argue that these three cases intuitively represent different ways of violating a since operator. In the frst disjunct, α is violated at some time-point after the interval starts and β is violated

Fig. 3: Graphical representation of the violation cases for <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup> with *<sup>I</sup>* = [*l*,*r*]

from that time-point until the interval ends. Indeed, the violation proof *j* ⊢ − α is enough to dismiss all previous occurrences of a satisfaction of β. Moreover, if *<sup>l</sup>* ̸<sup>=</sup> 0, i.e., if the interval does not include the current time-point, then α may be violated between the interval's end and the current time-point. Figure 3(a) shows both cases, where φ denotes a violation of φ. In the second disjunct, β is violated at every time-point inside the interval (Figure 3(b)). The third disjunct captures the special case at the beginning of the trace when the interval is located before the frst time-point (Figure 3(c)). Next, we consider U*<sup>I</sup>* :

$$\begin{array}{rcl} i\mathbb{M}\upharpoonright\alpha\mathbb{M}\!\!/\!/\!/\!/\!\!/ &\leftrightarrow \begin{cases} \exists j\in[i, \mathbb{L}\_{l}^{\mathsf{f}}(I)).j\mathbb{M}\ \alpha\wedge\forall k\in[\mathsf{E}\_{l}^{\mathsf{f}}(I), j].k\nmid\beta\} \vee\\ \forall k\in[\mathsf{E}\_{l}^{\mathsf{f}}(I), \mathsf{L}\_{l}^{\mathsf{f}}(I)].k\nmid\beta\rangle\\ \forall J^{-}\text{ and }\mathcal{U}^{-}\text{ correspond to the two divisors in Equation (\mathcal{I}). In the first}\end{array} \end{array} \tag{2}$$

The rules U <sup>−</sup> and U − <sup>∞</sup> correspond to the two disjuncts in Equation (2). In the frst disjunct, β is violated from the interval start until a time-point *<sup>j</sup>* at which also α is violated. Symmetrically to S <sup>−</sup>, we can dismiss all satisfactions of β after *<sup>j</sup>* because of the violation proof *j* ⊢ − α. In the second disjunct, β is violated at every time-point inside the interval.

Theorem 1. *Fix an arbitrary trace* <sup>ρ</sup> <sup>=</sup> ⟨(π*<sup>i</sup>* ,τ*i*)⟩*i*∈<sup>N</sup> *. For any formula* φ *and <sup>i</sup>* <sup>∈</sup> <sup>N</sup>*, we have i* ⊢ + φ *iff <sup>i</sup>* <sup>⊨</sup> φ *and <sup>i</sup>* <sup>⊢</sup> − φ *iff <sup>i</sup>* <sup>⊭</sup> φ*, i.e., the proof system is sound and complete.*

In other words, proof trees in our proof system contain all the necessary information to explain why a formula has been satisfed or violated on a given trace. A mechanically checked proof of the above statement can be found in our Isabelle formalization [22].

*Example 1.* Let ρ <sup>=</sup> ⟨({*a*,*b*, *<sup>c</sup>*},1),({*a*,*b*},3),({*a*,*b*},3),({·},3),({*a*},3),({*a*},4)⟩ and <sup>φ</sup> <sup>=</sup> *<sup>a</sup>* <sup>S</sup>[1,2] (*b*∧*c*). A proof of 5 ̸|<sup>=</sup> φ has the following form:

$$\frac{\frac{a \notin \{\cdot\}}{3 \vdash^{-} a} \; ap^{-}}{\frac{3 \vdash^{-} a}{4 \vdash^{-} b \wedge c} \; \frac{}{3 \vdash^{-} b \wedge c} \; ^{-} \frac{b \notin \{a\}}{4 \vdash^{-} b \wedge c} \; \frac{ap^{-}}{4 \vdash^{-} b \wedge c}}{\\$ \vdash^{-} a \; \mathcal{S}\_{[1,2]} \; (b \wedge c)} \; \mathcal{S}^{-}$$

In ρ, only events with time-stamp <sup>3</sup> satisfy the interval conditions, resulting in <sup>E</sup> p 5 (*I*) = 1 and L p 5 (*I*) = 4, where *<sup>I</sup>* = [1,2]. (Time-points are zero-based.) Thus, the portion of the trace we are interested in is ⟨({*a*,*b*},3),({*a*,*b*},3),({·},3),({*a*},3)⟩. Here, *<sup>a</sup>* is only violated at time-point 3, so our proof includes the witness 3 ⊢ <sup>−</sup> *a*. From there until time-point L p 5 (*I*) = 4 the subformula *b*∧*c* is violated, witnessed by 3 ⊢ <sup>−</sup> *b* and 4 ⊢ <sup>−</sup> *b*. ■

#### 3 Proof Objects

To make proofs from our proof system explicit, we defne an inductive syntax for satisfaction (sp) and violation (vp) proofs and call this representation *proof objects*. Proof objects allow us to easily compute with, modify and compare the size of proof trees. From now on, the term proof will be used for both proof tree and proof object.

$$\begin{array}{|c|c|c|c|c|c|}
\hline
\mathfrak{sp} = ap^{+}(\mathbb{N},\Sigma) & \neg^{+}(\mathfrak{sp}) \mid \bigvee\_{L}^{+}(\mathfrak{sp}) \mid \bigvee\_{R}^{+}(\mathfrak{sp}) \mid \wedge^{+}(\mathfrak{sp},\mathfrak{sp}) \mid \blackrightarrow^{+}(\mathfrak{sp}) \mid \bigvee^{-}(\mathfrak{sp}) \\
& \mid \quad \mathcal{S}^{+}(\mathfrak{sp},\overline{\mathfrak{sp}}\_{\mathcal{D}}) \mid \mathcal{U}^{+}(\mathfrak{sp},\overline{\mathfrak{sp}}\_{\mathcal{D}}) \\
\mathfrak{sp} = ap^{-}(\mathbb{N},\Sigma) & \neg^{-}(\mathfrak{sp}) \mid \vee^{-}(\mathfrak{sp},\mathfrak{sp}) \mid \wedge^{-}\_{L}(\mathfrak{sp}) \mid \wedge^{-}\_{R}(\mathfrak{sp}) \mid \bot^{-}(\mathfrak{sp}) \mid \bot^{-}(\mathfrak{sp}) \mid \bot^{-}(\mathfrak{sp}) \mid \bot^{-}(\mathfrak{N}) \\
& \mid \quad \bullet\_{>I}^{-}(\mathbb{N}) \mid \bullet\_{0}^{-} \mid \bigcirc^{-}(\mathfrak{sp}) \mid \bigcirc\_{I}^{-}(\mathbb{N}) \mid \mathcal{S}^{-}(\mathbb{N}) \mid \mathcal{S}^{-}(\mathbb{N},\mathfrak{sp},\overline{\mathfrak{sp}}\_{\mathcal{D}}) \\
& \mid \quad \mathcal{S}^{-}(\mathbb{N},\overline{\mathfrak{sp}}\_{\mathcal{D}}) \mid \mathcal{U}^{-}(\mathbb{N},\mathfrak{sp},\overline{\mathfrak{sp}}\_{\mathcal{D}}) \mid \mathcal{U}^{-}(\mathbb{N},\overline{\mathfrak{sp}}\_{\mathcal{D}}) \\
\hline
\end{array}$$

Here, sp and vp denote fnite non-empty sequences of sp and vp subproofs and sp∅ and vp∅ denote fnite possibly empty sequences of sp and vp subproofs. We defne p = sp⊎ vp to be the disjoint union of satisfaction and violation proofs. Given a proof *p* ∈ p, we defne V(*p*) to be ⊤ if *p* ∈ sp and ⊥ if *p* ∈ vp. Each constructor corresponds to a rule in our proof system. Each proof *p* has an associated time-point tp(*p*) for which it witnesses the satisfaction or violation. In some cases, tp(*p*) can be computed recursively from *p*'s subproofs. For example, tp(S <sup>+</sup>(*p*,[*q*1,...,*qn*])) is tp(*qn*) if *<sup>n</sup>* > <sup>0</sup> and tp(*p*) otherwise. Similarly, tp(U <sup>+</sup>(*p*,[*q*1,...,*qn*])) is tp(*q*1) if *<sup>n</sup>* > <sup>0</sup> and tp(*p*) otherwise. Other cases, namely *ap*+, *ap*−, − <*I* , − >*I* , # − <*I* , # − >*I* , S − <*I* , S <sup>−</sup>, and S − <sup>∞</sup>, explicitly store the associated time-points as an argument of type N because we cannot compute them from the respective subproofs. For example, tp(*ap*+(*j*,*a*)) = *<sup>j</sup>* and tp(<sup>S</sup> <sup>−</sup>(*j*,*q*,[*p*1,..., *<sup>p</sup>n*])) = *<sup>j</sup>*.

Given a trace <sup>ρ</sup> <sup>=</sup> ⟨(π*<sup>i</sup>* ,τ*i*)⟩*i*∈<sup>N</sup> and a formula φ, we call a proof *<sup>p</sup> valid* at tp(*p*), denoted by *<sup>p</sup>* <sup>⊢</sup> φ, if *<sup>p</sup>* represents a valid proof according to the rules of our local proof system. Note that once again we leave the dependency on ρ implicit in *<sup>p</sup>* <sup>⊢</sup> φ. Formally, validity *<sup>p</sup>* <sup>⊢</sup> φ is defned recursively, checking for each constructor that the corresponding rule has been correctly applied. For example, atomic proofs are valid if the mentioned atom is (not) contained in the trace at the specifed time-points: *ap*+(*i*,*a*) <sup>⊢</sup> *<sup>a</sup>* <sup>↔</sup> *<sup>a</sup>* <sup>∈</sup> <sup>π</sup>*<sup>i</sup>* (*ap*−(*i*,*a*) <sup>⊢</sup> *<sup>a</sup>* <sup>↔</sup> *<sup>a</sup>* <sup>∈</sup>/ π*i*). Moreover, for *<sup>r</sup>* <sup>=</sup> <sup>S</sup> <sup>+</sup>(*p*,[*q*1,...,*qn*]) we have

$$\begin{array}{c} r \vdash \mathsf{a} \; \mathsf{S}\_{I} \; \beta \; \leftrightarrow \; \mathsf{tp}(p) \leq \mathsf{tp}(r) \land \tau\_{\mathsf{tp}(r)} - \tau\_{\mathsf{tp}(p)} \in I \land \\\ \qquad \qquad \qquad \qquad \mathsf{[tp}(q\_{1}), \ldots, \mathsf{tp}(q\_{n})] = [\mathsf{tp}(p) + 1, \mathsf{tp}(r)] \land p \vdash \beta \land (\forall k \in [1, n], q\_{k} \vdash \alpha). \end{array}$$

Multiple valid proofs may exist for a time-point *<sup>i</sup>* and formula φ as we demonstrate next.

*Example 2.* The proof object representing the proof tree from Example 1 is *P*<sup>1</sup> = S <sup>−</sup>(5,*ap*−(3,*a*),[<sup>∧</sup> − *L* (*ap*−(3,*b*)),<sup>∧</sup> − *L* (*ap*−(4,*b*))]). However, we could have argued differently, using the fact that *c* is violated at all time-points inside the interval. Then, S − ∞ would be used instead to construct the proof *P*<sup>2</sup> = S − <sup>∞</sup>(5,[<sup>∧</sup> − *R* (*ap*−(1, *<sup>c</sup>*)),<sup>∧</sup> − *R* (*ap*−(2, *<sup>c</sup>*)), ∧ − *R* (*ap*−(3, *<sup>c</sup>*)),<sup>∧</sup> − *R* (*ap*−(4, *<sup>c</sup>*))]), which is also a valid proof at tp(*P*2) = 5. In addition, *P*<sup>3</sup> = S <sup>−</sup>(5,*ap*−(3,*a*),[<sup>∧</sup> − *L* (*ap*−(3, *<sup>c</sup>*)),<sup>∧</sup> − *L* (*ap*−(4, *<sup>c</sup>*))]) is another valid proof at tp(*P*3) = 5. It is structurally identical to *P*1, but instead of using the violations of *b* as witnesses for time-points 3 and 4, it uses the violations of *c*. In fact, both *b* and *c* are violated at time-points 3 and 4, so we can use either to justify the violations of *b*∧*c*.

We now compare *P*1, *P*2, and *P*3. The proof *P*<sup>2</sup> uses S − <sup>∞</sup>, so we must store a witness of the violation of *b*∧*c* for each one of the 4 time-points inside the interval. The proofs *P*<sup>1</sup> and *P*<sup>3</sup> use S <sup>−</sup>, taking advantage of the violation proof 3 ⊢ <sup>−</sup> *a* that allows us to dismiss both 1 ⊢ <sup>+</sup> *a* and 2 ⊢ <sup>+</sup> *a*. Formally, we defne the size |*p*| of a proof *p* to be the number of proof object constructors occurring in *p*. Then, |*P*1| = |*P*3| = 6, and |*P*2| = 9. ■

We are particularly interested in small proofs as they tend to be easier to understand. Given a trace ρ and a formula φ, a proof *<sup>p</sup>* is *minimal* at time-point *<sup>i</sup>* if and only if it is

type *buf* = p *list* ×p *list* type *buft* = p *list* ×p *list* ×((*ts*×*tp*) *list*) type *saux* <sup>=</sup> { ts*zero* : *ts option*, ts tp*in* : (*ts*×*tp*)*list*, ts tp*out* : (*ts*×*tp*)*list*, <sup>s</sup> beta alphas*in* : (*ts*×sp)*slist*, <sup>s</sup> beta alphas*out* : (*ts*×sp)*list*, <sup>v</sup> alpha betas*in* : (*ts*×vp)*slist*, <sup>v</sup> alphas*out* : (*ts*×vp)*slist*, <sup>v</sup> betas*in* : (*ts*×vp)*list*, <sup>v</sup> alphas betas*out* : (*ts*×vp *option*×vp *option*)*list* } type *state* = Pred<sup>S</sup> *string* | Neg<sup>S</sup> *state* | And<sup>S</sup> *state state buf* | Or<sup>S</sup> *state state buf* | Prev<sup>S</sup> I *state bool* p (*ts list*) | Next<sup>S</sup> I *state bool* (*ts list*) | Since<sup>S</sup> I *state state buft saux* | Until<sup>S</sup> I *state state buft uaux* function init :: *formula* ⇒ *state* function eval :: *ts*×*tp* ⇒ *atom set* ⇒ *state* ⇒ p *list* ×*state*

Fig. 4: Types of the monitor's state and evaluation functions

valid at *<sup>i</sup>* (*<sup>p</sup>* <sup>⊢</sup> φ and tp(*p*) = *<sup>i</sup>*), and all other valid proofs *<sup>q</sup>* (at *<sup>i</sup>*) have greater or equal size (*<sup>q</sup>* <sup>⊢</sup> <sup>φ</sup> and tp(*q*) = *<sup>i</sup>* implies <sup>|</sup>*p*| ≤ |*q*|). In our example, *<sup>P</sup>*<sup>1</sup> and *<sup>P</sup>*<sup>3</sup> are minimal.

### 4 Computing Minimal Proofs

Given an MTL formula φ, our (online) monitor incrementally processes a trace and for each time-point *<sup>i</sup>* it outputs a minimal proof of the satisfaction or violation of φ at *<sup>i</sup>*. The algorithm constructs this minimal proof of φ by combining minimal proofs of φ's immediate subformulas. To do this effciently, the monitor maintains just enough information about the trace in its state so that it can guarantee to output minimal proofs. In case the monitored formula includes (bounded) future operators, the monitor's output may be delayed, such that a single event may trigger the output of multiple proofs at once. In this section, we describe our algorithm in detail and explain its correctness.

#### 4.1 Monitor's State

Figure 4 shows the types of our algorithm's main functions init, which computes the monitor's initial state, and eval, which processes a time-stamped event while updating the monitor's state and producing a list of minimal proofs (satisfactions or violations) for an in-order (potentially empty) sequence of time-points. Our monitor's state (type *state* in Figure 4) has the same tree-like structure as the monitored MTL formula. Additionally, it stores operator-specifc information for each Boolean and temporal operator. For example, in the state of <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup>, we store the interval *<sup>I</sup>*, the states of the subformulas <sup>α</sup> and <sup>β</sup>, a buffer *buft* for proofs (and associated time-stamps) coming from the recursive evaluation of subformulas and the operator-specifc data structures *saux*. Our monitor's overall structure is modeled after VERIMON [29], which has a similar interface (init and eval) and *state* type including the used buffers *buf* and *buft*. The main novelty is our design of the *saux* and *uaux* data structures, which store suffcient information to compute minimal proofs for formulas with topmost operator S and U. Here, we only describe *saux* in detail.

The data structure *saux* for a formula <sup>φ</sup> <sup>=</sup> <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup> is a record consisting of nine felds. We will describe it next assuming that φ is being evaluated at the current time-point *cur*. Furthermore, some felds have the type *option*, which means they are of the form ⊥ (if no value is available) or ⌊*v*⌋ (storing the value *v*). The function THE retrieves the optional

```
1: procedure UPDATE SAUX ([l,r],τcur, cur, p1, p2,saux)
2: saux.tszero ← if saux.tszero = ⊥ then ⌊τcur⌋ else saux.tszero
3: saux ← ADD SUBPS (τcur, p1, p2,saux)
              ▷ update s betas alphasin, s betas alphasout, v alphas betasout, and v alphasout
4: if τcur < THE (saux.tszero) +l then
5: saux.ts tpout ← APPEND (saux.ts tpout,[(τcur, cur)])
6: return (S
                   −
                   <I
                     (cur),saux)
7: else
8: lr ← (if r = ∞ then THE (saux.ts zero) else MAX (0,τcur −r), τcur −l)
9: saux ← SHIFT SAUX(lr,l,τcur, cur,saux)
10: minimal proof ← EVAL SAUX(cur,saux) ▷ extract proofs; pick one of minimal size
11: return (minimal proof ,saux)
12:
13: procedure SHIFT SAUX (lr,l,τcur, cur,saux)
14: saux ← SHIFT TS TPS (lr,l,τcur, cur,saux) ▷ update ts tpout and ts tpin
15: saux ← SHIFT SAT (lr,saux) ▷ update s beta alphasout and s beta alphasin
16: saux ← SHIFT VIO (lr,saux) ▷ update v alphas betasout, v alpha betasin, and v betasin
17: saux ← REMOVE SAUX (lr,saux) ▷ remove too old proofs (that fell out of the interval)
18: return saux
```
#### Algorithm 1: State update algorithm for Since

value from ⌊*v*⌋, i.e., THE (⌊*v*⌋) = *v*. The feld ts*zero* stores ⊥ in the initial state, and after the frst event arrives, it stores the frst time-stamp ⌊τ0⌋. Fields ts tp*in* and ts tp*out* store lists of time-stamp-time-point pairs inside the interval (between E p *cur*(*I*) and L p *cur*(*I*)) and after the interval (between L p *cur*(*I*) +1 and *cur*), respectively. The other felds store satisfaction (prefx s ) or violation (v ) proofs. Specifcally, s beta alphas*in* stores S + proofs inside and s beta alphas*out* stores S <sup>+</sup> proofs after the interval. Crucially, while s beta alphas*out* is an ordinary list, s beta alphas*in* has type *slist*, which is a variant of the list type that indicates that the stored proofs are sorted in ascending order (with respect to size). We maintain this invariant to optimize the number of proofs we must store, i.e., if a proof enters the interval, we can delete all larger proofs that entered the interval prior to it. In addition, we can quickly access the frst proof of this list which necessarily has minimal size. On the other hand, s beta alphas*out* must store all proofs because it is not possible to predict when and which of these proofs will enter the interval.

Furthermore, v alpha betas*in* is the analogue of s beta alphas*in* for S <sup>−</sup> proofs with a violation of α inside the interval, and a sequence of violations of β until the end of the interval. Note that S <sup>−</sup> proofs can also be constructed using a single violation proof of α that occurs after the interval, and these are instead stored in the also sorted list v alphas*out*. Moreover, S − <sup>∞</sup> proofs require that <sup>β</sup> is violated at all time-points inside the interval, so <sup>v</sup> betas*in* stores a suffx of <sup>β</sup> violations inside the interval. Finally, <sup>v</sup> alphas betas stores all α and β violations outside the interval, so all other components that store violation proofs inside the interval can be effciently updated when the interval shifts.

#### 4.2 State Update

Algorithm 1 shows the skeleton of our procedure for updating (and simultaneously evaluating) the state of a since operator. The state update for <sup>φ</sup> <sup>=</sup> <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup> is parametrized by the interval *<sup>I</sup>* = [*l*,*r*], the current time-point *cur* and its time-stamp τ*cur*, minimal proofs *<sup>p</sup>*<sup>1</sup> and *<sup>p</sup>*<sup>2</sup> (obtained recursively) for the subformulas <sup>α</sup> and <sup>β</sup>, respectively, and the current state *saux*. The procedure frst checks if *cur* is the frst time-point to arrive and initializes ts*zero* accordingly (line 2). Next, we add the new subproofs to their destinations (ADD SUBPS). For example, if *p*<sup>1</sup> ∈ sp then all proofs from s betas alphas*in* and <sup>s</sup> betas alphas*out* are extended with this additional satisfaction proof for <sup>α</sup>. In contrast, if *<sup>p</sup>*<sup>1</sup> <sup>∈</sup> vp then both <sup>s</sup> betas alphas lists are emptied and the violation of <sup>α</sup> is stored in v alphas*out* and v alphas betas*out* instead. A similar case distinction happens for *p*2. After storing the proofs, we handle the case where *cur* is a time-point at the beginning of the trace for which the past interval has not started yet (lines 4–6), which corresponds to the S − <*I* case depicted in Figure 3(b) on the right. Here, we add a new time-stamp-time-point pair to ts tp*out* (line 5), and return the proof S − (*cur*) and the updated *saux*.

<*I* In the general case (when the interval has started), we compute the absolute timestamp pair *lr* that constitute the boundaries of the past interval *<sup>I</sup>* relative to <sup>τ</sup>*cur* (line 8). We use the absolute boundaries to identify a potential interval shift and move proofs in *saux* from the *out* lists to the *in* lists accordingly (line 9). Lines 13–18 provide additional details in which order the various components are shifted. Lastly, we compute a minimal proof (line 10), performing a case distinction. If s beta alphas*in* is non-empty, then its head must be a minimal satisfaction proof. Otherwise, the formula is violated and a minimal violation proof is either the head of v alpha betas*in* or the head of v alphas*out* (after adding a S <sup>−</sup> constructor) or the application of S − <sup>∞</sup> to v betas*in* (provided that this suffx spans the entire interval which can be deduced by comparing the lengths of v betas*in* and ts tp*in*). We extract these (at most three) candidates, compute their sizes, and pick one of minimal size. This minimal proof and the updated *saux* are then returned (line 11).

*Example 3.* To illustrate how the state is updated, we once again consider the formula and trace introduced in Example 1. Figure 5 shows the *saux* states of our algorithm and the produced minimal proof after processing every event. In every state, we only show the non-empty components. Initially, all components of the state are empty except for ts*zero*, which is <sup>⊥</sup>. When the frst event ({*a*,*b*, *<sup>c</sup>*},1) arrives, the list ts tp*out* is updated accordingly and a pair with time-stamp 1 and a S <sup>+</sup> proof using the satisfactions of *b* and *c* is added to s beta alphas*out*. This proof is clearly not valid for the current time-point 0, considering that the interval [1,2] has not yet started, so the monitor outputs the trivial proof S − <*I* (0). The time-stamp of the frst event moves inside the interval when the second event ({*a*,*b*},3) arrives, and both ts tp*out* and ts tp*in* are updated accordingly. Furthermore, the algorithm extends the S <sup>+</sup> proof previously stored in s beta alphas*out* by adding *ap*+(1,*a*) to the sequence of *<sup>a</sup>* satisfactions, after which the resulting proof is moved to <sup>s</sup> beta alphas*in*. The algorithm also appends the proof *ap*−(1, *<sup>c</sup>*) to <sup>v</sup> alphas betas*out*. Because s beta alphas*in* is not empty, the monitor outputs the frst proof of this list.

In the next step, event ({*a*,*b*},3) arrives and the monitor proceeds similarly, adding the proof *ap*+(2,*a*) to the <sup>S</sup> <sup>+</sup> proof in s beta alphas*in*. Aside from outputting the extended satisfaction proof, the algorithm also adds the proof *ap*−(2, *<sup>c</sup>*)to <sup>v</sup> alphas betas*out*.


Fig. 5: The monitor's *saux* states when executing Example 1

When event ({·},3) arrives, the sequence of *<sup>a</sup>* satisfactions comes to an end, which indicates that the proofs in s beta alphas*in* and s beta alphas*out* are no longer valid nor useful. Hence, we clear both lists. In addition, the proof *ap*−(3,*a*) is stored in <sup>v</sup> alphas*out*, since the *a* violation happens after the interval. This subproof is also appended to v alphas betas*out* along with the violation of the conjunction ∧ − *L* . The algorithm then proceeds to construct a violation proof S <sup>−</sup>(3,*ap*−(3,*a*),[·]) using the subproof stored in <sup>v</sup> alphas*out* and outputs it. When ({*a*},3) arrives, the algorithm appends the proof <sup>∧</sup> − *L* to v alphas betas*out* and again uses the same subproof stored in v alphas*out* to construct S <sup>−</sup>(4,*ap*−(3,*a*),[·]). Note that this proof has an associated time-point of 4, which is the only distinction from the last proof that the monitor output.

Finally, when the last event ({*a*},4) arrives, the interval shifts and ts tp*in* and ts tp*out* change accordingly. At this stage, the algorithm populates v alpha betas*in* and v betas*in* with the subproofs stored in v alphas betas*out*. In particular, it constructs and stores the proof S <sup>−</sup>(5,*ap*−(3,*a*),[<sup>∧</sup> − *L* (*ap*−(3,*b*)),<sup>∧</sup> − *L* (*ap*−(4,*b*))]) in <sup>v</sup> alpha betas*in*. Moreover, a sorted(s beta alphas*in*)∧sorted(v alpha betas*in*)∧sorted(v alphas*out*)∧ (1) <sup>∀</sup>(τ,*u*) <sup>∈</sup> <sup>s</sup> beta alphas*in*. <sup>∃</sup>*<sup>p</sup> <sup>q</sup>*¯. *<sup>u</sup>* <sup>=</sup> <sup>S</sup> <sup>+</sup>(*p*,*q*)∧*<sup>u</sup>* <sup>⊢</sup> <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup>∧tp(*u*) = *cur*∧<sup>τ</sup> <sup>=</sup> ts(*p*) (2) <sup>∀</sup>(τ,*u*) <sup>∈</sup> <sup>s</sup> beta alphas*out*. <sup>∃</sup>*<sup>p</sup> <sup>q</sup>*¯. *<sup>u</sup>* <sup>=</sup> <sup>S</sup> <sup>+</sup>(*p*,*q*)∧*<sup>u</sup>* <sup>⊢</sup> α <sup>S</sup> β∧tp(*u*) = *cur*∧τ <sup>=</sup> ts(*p*) (3) <sup>∀</sup>(τ,*u*) <sup>∈</sup> <sup>v</sup> alpha betas*in*. <sup>∃</sup>*<sup>p</sup> <sup>q</sup>*¯. *<sup>u</sup>* <sup>=</sup> <sup>S</sup> <sup>−</sup>(*cur*, *<sup>p</sup>*,*q*)∧*<sup>u</sup>* <sup>⊢</sup> <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup>∧<sup>τ</sup> <sup>=</sup> ts(*p*) (4) <sup>∀</sup>(τ, *<sup>p</sup>*) <sup>∈</sup> <sup>v</sup> alphas*out*. <sup>S</sup> <sup>−</sup>(*cur*, *<sup>p</sup>*,[]) <sup>⊢</sup> <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup>∧<sup>τ</sup> <sup>=</sup> ts(*p*) (5) <sup>∀</sup>(τ, *<sup>p</sup>*) <sup>∈</sup> <sup>v</sup> betas sufx*in*. <sup>E</sup> p *cur*(*I*) ≤ tp(*p*) ≤ L p *cur*(*I*)<sup>∧</sup> *<sup>p</sup>* <sup>⊢</sup> β∧ ¬V(*p*)∧τ <sup>=</sup> ts(*p*) (6) <sup>∀</sup>(τ, *<sup>p</sup>* ∗ ,*q* ∗ ) <sup>∈</sup> <sup>v</sup> alphas betas*out*. <sup>∃</sup>*<sup>i</sup>* <sup>∈</sup> L p *cur*(*I*), *cur* . τ <sup>=</sup> <sup>τ</sup>*<sup>i</sup>* <sup>∧</sup> (*p* <sup>∗</sup> <sup>=</sup> ⊥ ∨(∃*p*. <sup>¬</sup>V(*p*)<sup>∧</sup> *<sup>p</sup>* <sup>∗</sup> <sup>=</sup> ⌊*p*⌋ ∧ *<sup>p</sup>* <sup>⊢</sup> α))∧(*<sup>q</sup>* <sup>∗</sup> <sup>=</sup> ⊥ ∨(∃*q*. <sup>¬</sup>V(*q*)∧*<sup>q</sup>* <sup>∗</sup> <sup>=</sup> ⌊*q*⌋ ∧*<sup>q</sup>* <sup>⊢</sup> β))

Fig. 6: The algorithm's invariant (soundness)

sequence of violations of the conjunction inside the interval is stored in v betas*in*. This sequence of violations flls the entire interval, so it is then used to construct the proof S − <sup>∞</sup>(5,[<sup>∧</sup> − *R* (*ap*−(1, *<sup>c</sup>*)),<sup>∧</sup> − *R* (*ap*−(2, *<sup>c</sup>*)),<sup>∧</sup> − *R* (*ap*−(3, *<sup>c</sup>*)),<sup>∧</sup> − *R* (*ap*−(4, *<sup>c</sup>*))]). The <sup>S</sup> − proof corresponds precisely to the proof tree presented in Example 1, and the proof object *P*<sup>1</sup> in Example 2, whereas the S − <sup>∞</sup> proof corresponds to the proof object *P*2. Lastly, the size of these two proofs is computed, and the algorithm selects the S <sup>−</sup> proof, since it is smaller (i.e., it includes fewer constructors). ■

#### 4.3 Correctness

We now formally describe the invariant we maintain for *saux*. We write ts(*p*) for the timestamp associated with a proof, i.e., the time-stamp <sup>τ</sup>tp(*p*) of the associated time-point tp(*p*). We also use functional programming notations like λ-abstractions and the list map function. We defne the predicate sorted(*seq*) := ∀(τ*i* , *<sup>p</sup>i*),(τ*<sup>j</sup>* , *<sup>p</sup>j*) <sup>∈</sup> *seq*. (*<sup>i</sup>* < *<sup>j</sup>*)<sup>∧</sup> (*<sup>j</sup>* <sup>&</sup>lt; length(*seq*)) <sup>→</sup> <sup>τ</sup>*<sup>i</sup>* <sup>≤</sup> <sup>τ</sup>*<sup>j</sup>* ∧|*p<sup>i</sup>* | ≤ |*p<sup>j</sup>* | over a sequence of pairs of time-stamps and proofs and assume that every sequence below is monotone with respect to time-stamps (*<sup>i</sup>* <sup>&</sup>lt; *<sup>j</sup>* implies <sup>τ</sup>*<sup>i</sup>* <sup>≤</sup> <sup>τ</sup>*j*). The felds ts*zero*, ts tp*in* and ts tp*out* are characterized as follows:

$$\mathsf{ts}\_{\mathsf{z}rro} = \begin{cases} \bot & \text{iff } cur = -1 \\ \lfloor \tau\_0 \rfloor & \text{iff } cur \ge 0 \end{cases} \qquad \mathsf{ts}.\mathsf{tp}\_{in} = \mathsf{map}\ (\mathsf{A}i.\ (\tau\_i, i)) \left[ \mathsf{E}\_{cur}^{\mathsf{p}}(I), \mathsf{L}\_{cur}^{\mathsf{p}}(I) \right]$$

The desired properties of the objects stored in other felds are given in Figure 6.

We describe each of the invariant's statements. In (1) a proof in s beta alphas*in* (which must be sorted) must have form S <sup>+</sup>(*p*,*q*) and be a valid proof of <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup> at the current time-point, with time-stamp ts(*p*). Next, (2) requires proofs to have the same form but instead be valid for a modifed formula without the interval *I*. In this case, we can relax the timing constraint because these proofs will only be valid at a later time-point, namely once ts(*p*) moves inside the interval. The statement (3) is precisely the same as (1), but for S <sup>−</sup> proofs. In (4), each proof *p* in v alphas*out* (which must too be sorted) must be a valid subproof of a S <sup>−</sup> proof at the current time-point with time-stamp ts(*p*). In (5), each subproof corresponding to the violation of β must be inside the interval with time-stamp ts(*p*). The statement (6) specifes that outside the interval there is either a subproof of a violation of α or β or there are no such proofs. These statements formalize what must hold for the things stored in *saux*, which yields soundness. We briefy consider completeness,

Fig. 7: Visualization of Example 1

by answering the question of what must be stored, on the example of s beta alphas*in*:

$$\begin{split} \forall p \; \overline{q} \; \tau . \mathcal{S}^{+}(p, \overline{q}) \vdash a \mathcal{S}\_{l} \boldsymbol{\beta} \land \mathsf{tp} \left( \mathcal{S}^{+}(p, \overline{q}) \right) = \mathsf{cur} \wedge \tau = \mathsf{ts} \left( p \right) \rightarrow \\ \left( \exists p' \; \overline{q}' \; \tau' . \left| \mathcal{S}^{+}(p', \overline{q}') \right| \leq \left| \mathcal{S}^{+}(p, \overline{q}) \right| \land \mathcal{S}^{+}(p', \overline{q}') \vdash \mathsf{a} \, \mathsf{S}\_{l} \boldsymbol{\beta} \land \tau' = \mathsf{ts} \left( p' \right) \land \\ \tau' \geq \tau \land \mathsf{tp} \left( \mathcal{S}^{+}(p', \overline{q}') \right) = \mathsf{tp} \left( \mathcal{S}^{+}(p, \overline{q}) \right) \land \left( \tau', \mathcal{S}^{+}(p', \overline{q}') \right) \in \mathsf{s.beta.a.a} \mathsf{a} \mathsf{p} \mathsf{has}\_{\mathit{in}} \right) . \end{split}$$

In words: for any valid S <sup>+</sup> proof for <sup>φ</sup> <sup>=</sup> <sup>α</sup> <sup>S</sup>*<sup>I</sup>* <sup>β</sup> at time-point *cur*, we must store in <sup>s</sup> beta alphas*in* another proof at most as large and old, that is also valid for <sup>φ</sup> at *cur*. Other felds of *saux* have similar completeness statements and so have other state components.

Together, soundness and completeness ensure that given a formula, a trace, and a timepoint *i*, our online monitoring algorithm will eventually output a valid minimal proof at *i*.

### 5 Implementation

We implement our algorithm in a new tool called EXPLANATOR2 [22]. The implementation amounts to around 4 000 lines of OCaml. In addition, a 6 900 lines long OCaml program is extracted from our Isabelle formalization consisting of 19 000 lines of defnitions and proofs. The extracted program contains the proof object validity checker in the form of a function is valid : *trace* → *formula* → *proof* → *bool*, which effectively implements what we denote by *<sup>p</sup>* <sup>⊢</sup> φ. Moreover, it also contains the minimality checker is minimal : *trace* <sup>→</sup> *formula* <sup>→</sup> *proof* <sup>→</sup> *bool* that given a trace ρ, a formula φ, and a proof *<sup>p</sup>* computes a proof *<sup>q</sup>* for φ on ρ at time-point tp(*p*) with a minimal size using a verifed dynamic programming algorithm and then checks that |*p*| ≤ |*q*|. Note that *q* may differ from *p* because minimal proof objects are not unique. Herasimau [16] provides more details on the formalization and the dynamic programming algorithm. We used the verifed validity and minimality checkers to thoroughly test our unverifed algorithm. Our tool includes a command line option to enable the verifed certifcation of its output, which slows down computation as the verifed algorithm is rather ineffcient but increases trustworthiness.

EXPLANATOR2 also includes a JavaScript web front end. To this end, we transpile the compiled OCaml bytecode to JavaScript using Js of ocaml [36]. The resulting JavaScript library runs in any web browser. We augment the library with an interactive visualization using React [17]. Figure 7 shows the visualization of our Example 1. On the left, the visualization shows the trace (from top to bottom) consisting of the atomic propositions (columns a, b, and c), the time-stamps (column TS) and associated time-points (column TP). The following columns show either the topmost operator of the different

.

. . . 61 ⊢ <sup>+</sup> *r*∧ ¬*q* ∧ + *q* ∈ {*q*} 56 ⊢ <sup>+</sup> *q ap*<sup>+</sup> 61 ⊢ <sup>+</sup> ♦*q* ♦ + 61 ⊢ <sup>+</sup> (*r*∧ ¬*q*)∧♦*q* ∧ + . . <sup>58</sup>,...,<sup>61</sup> <sup>⊢</sup> <sup>−</sup> *p*∨*q* ∨ − 61 ⊢ <sup>−</sup> ♦[0,3] (*p*∨*q*) ♦ − *<sup>q</sup>* ∈ { / *<sup>r</sup>*} 61 ⊢ <sup>−</sup> *q ap*<sup>−</sup> 61 ⊢ − ♦[0,3] (*p*∨*q*) S *q* S − 61 ⊢ − (*r*∧ ¬*q*)∧♦*q* → ♦[0,3] (*p*∨*q*) S *q* <sup>→</sup><sup>−</sup>

Fig. 8: Proof of φ1's violation at time-point 61

subformulas or the atomic propositions of our monitored MTL formula <sup>φ</sup> <sup>=</sup> *<sup>a</sup>* <sup>S</sup>[1,2] (*b*∧*c*). In particular, the column labeled with <sup>φ</sup>'s topmost operator, namely <sup>S</sup>[1,2] , shows the Boolean verdicts that a traditional monitor would output. Users of EXPLANATOR2 can further inspect the Boolean verdicts by clicking on them. Figure 7 shows the visualization's state after clicking on φ's violation at time-point 5. The visualization highlights the time interval and the Boolean verdicts for subformulas that justify the verdict associated with the inspected formula and time-point. Furthermore, it shows the relevant violations of φ's subformulas *<sup>a</sup>* and *<sup>b</sup>*∧*c*: the subformula *<sup>a</sup>* is violated at time-point <sup>3</sup> and *<sup>b</sup>*∧*<sup>c</sup>* is violated at time-points 3 and 4, which corresponds to a valid S <sup>−</sup> proof. The user could continue the exploration by further clicking on the two *b*∧*c* violations to fnd out that the tool used *b* violations to justify both. The visualization uses black circles to denote combinations of subformula and time-point that are relevant for at least one of φ's verdicts. The Boolean value for these relevant subformula verdicts is only revealed upon exploration.

### 6 Examples

We demonstrate how the minimal proofs produced by our monitor can be useful when trying to comprehend a satisfaction or violation of an MTL formula. To this end, we consider Timescales [34], a benchmark generator for MTL monitors. Timescales uses predefned MTL formulas that represent temporal patterns that commonly occur in real system designs [20]. It generates traces, in which the time-stamps are equal to their corresponding time-points. We selected the two most complex properties and generated their corresponding traces. At the end of both traces there is a violation of the pattern, and we use our approach to explain these violations. In addition to the operators presented in Figure 2, we extended our proof system and algorithm with the following operators: ⊤ (truth), ⊥ (falsity), → (implies), ↔ (iff), ■*<sup>I</sup>* (historically), □*<sup>I</sup>* (always), ♦*<sup>I</sup>* (once), and ♢*<sup>I</sup>* (eventually).

*Bounded Recurrence Between q and r.* The bounded recurrence property specifes the following pattern: between events *q* and *r* there is at least one occurrence of event *p* every *<sup>u</sup>* time units. In MTL, this pattern is captured by the formula <sup>φ</sup><sup>1</sup> = (*r*∧ ¬*q*∧♦*q*) <sup>→</sup> (♦[0,*u*] (*p*∨*q*)) S *q* . We set the bound *<sup>u</sup>* <sup>=</sup> 3, and we consider the trace ⟨...,({*q*},56), ({·},57),({·},58),({·},59),({·},60),({*r*},61)⟩, which is the portion pertinent to the proof. The formula <sup>φ</sup><sup>1</sup> is violated at time-point 61 and the proof is shown in Figure 8.

To prove the violation of the implication (the formula's topmost operator) the subformula on the left (assumption) must be satisfed and the subformula on the right (conclusion) must be violated. For this reason, two subproofs are constructed. In the left subproof,

Fig. 9: Visualization of φ1's violation at time-point 61

we can see that the subformula on the left is violated because both conjuncts *r*∧ ¬*q* and ♦*q* are satisfed at time-point 61. This part of the formula enforces that: (i) *r* is satisfed (and *q* is not satisfed) at the current time-point; and (ii) *q* is satisfed at some point in the past. Note that (ii) corresponds exactly to ♦*q*. In the left subproof, we have 61 ⊢ <sup>+</sup> *r*∧¬*q* because *r* is satisfed and *q* is violated at time-point 61. Moreover, the proof 61 ⊢ <sup>+</sup> ♦*q* uses the fact that *q* is satisfed at time-point 56, which is when the last *q* had arrived. Moving to the subproof 61 ⊢ <sup>−</sup> (♦[0,3] (*p*∨*q*)) S *q*, the violation occurs because both subformulas are violated at time-point 61. The subproof 61 ⊢ <sup>−</sup> ♦[0,3] (*p*∨*q*) uses the violations of *p* and *<sup>q</sup>* in the last <sup>3</sup> time units (58,...,61), whereas the proof <sup>61</sup> <sup>⊢</sup> <sup>−</sup> *q* indicates that *q* is not satisfed at the current time-point. This is suffcient to show that since the last *q* has arrived (at time-point 56), it is neither the case that a new sequence started (with a new occurrence of *q*) or that a sequence fnished (with an occurrence of *p*) within 3 time units in the past.

Figure 9 shows our visualization of the above proof. Starting from →, the columns show the topmost operators of φ1's subformulas (including atomic propositions). For example, <sup>φ</sup><sup>1</sup> is violated because the left subformula is satisfed (the frst <sup>∧</sup> column) and the right subformula is violated (column <sup>S</sup>[0,∞) ). All subformulas have a corresponding column and the order of the columns is such that immediate subformulas of a subformula appear further to the right. The same atomic proposition may occur in different subformulas, in which case there will be multiple columns showing the same proposition (but potentially different time-points of interest). Continuing our example, the right subproof from Figure <sup>8</sup> starts in column <sup>S</sup>[0,∞) in Figure 9. The formula (♦[0,3] (*p*∨*q*)) S *q* is violated at time-point 61 because both subformulas are violated. In the visualization, we focus (by clicking) on the subformula ♦[0,3] (*p*∨*q*) (displayed when hovering over the corresponding cell) and observe that it is violated because *p*∨*q* is violated at time-points <sup>58</sup>,...,<sup>61</sup> (highlighted cells in the <sup>∨</sup> column). Also, the context of this subproof, i.e., all parent nodes in the proof tree, is highlighted. In this case, these are <sup>→</sup> and <sup>S</sup>[0,∞) at time-point 61. Even though it presents the exact same information as the proof tree, our interactive visualization makes the proofs easier to navigate, explore, and digest.

*Bounded Response Between q and r.* Closely related to the bounded recurrence, the bounded response property specifes the following pattern: between events *q* and *r*, event *<sup>s</sup>* must respond to event *<sup>p</sup>* within the interval [*l*,*u*]. In MTL, this pattern is specifed by the formula <sup>φ</sup><sup>2</sup> = ((*r*∧ ¬*q*)∧♦*q*) <sup>→</sup> *<sup>s</sup>* <sup>→</sup> ♦[*l*,*u*]*<sup>p</sup>* ∧ ¬ <sup>¬</sup>*<sup>s</sup>* <sup>S</sup>[*u*,∞) *<sup>p</sup>* S *q* . We consider the trace ⟨...,({*q*},58),({*p*},59),({·},60),({·},61),({·},62),({·},63),({*r*},64)⟩ and set *<sup>l</sup>* <sup>=</sup> 0 and *<sup>u</sup>* <sup>=</sup> 3. Figure <sup>10</sup> shows a violation proof for <sup>φ</sup><sup>2</sup> at time-point 64.

Fig. 11: Visualization of φ2's violation at time-point 64

The implication's assumption in <sup>φ</sup><sup>2</sup> is the same as the assumption in <sup>φ</sup><sup>1</sup> (the *bounded recurrence* formula). We omit the corresponding subproof *P* from Figure 11 as it has the same structure as the subproof of the *bounded recurrence* example. (Yet, there are differences in the time-points.) The conclusion of <sup>φ</sup><sup>2</sup> has the form <sup>α</sup> <sup>S</sup> *<sup>q</sup>*. It is violated at time-point <sup>64</sup> because α is violated at time-point 62, and from this point onward until the current time-point 64, *q* is always violated. According to our proof system, we only need to consider violations of *<sup>q</sup>* starting at time-point 62, because α is violated at that point. The formula α <sup>=</sup> *<sup>s</sup>* <sup>→</sup> ♦[0,3] *<sup>p</sup>* ∧¬ <sup>¬</sup>*<sup>s</sup>* <sup>S</sup>[3,∞) *<sup>p</sup>* captures two properties: (i) if there is a response *s* then there must be a recent challenge *p* (i.e., *p* must be satisfed within the last 3 time units); (ii) there are no challenges *p* more than 3 time units in the past without a response *<sup>s</sup>*. In our proof, the violation of α is constructed using the violation of (ii). After applying the negation rule, the proof 62 ⊢ <sup>+</sup> <sup>¬</sup>*<sup>s</sup>* <sup>S</sup>[3,∞) *<sup>p</sup>* uses the fact that *<sup>p</sup>* is satisfed at time-point 59 and that *s* is violated at time-points 60, 61 and 62. In other words, there was no response *s* to the challenge *p* within the required time constraint. Figure 11 shows the visualization of this subproof. While the static image already helps with the intuition, we invite the reader to explore this and the previous example in our interactive visualization.

#### 7 Performance

We empirically evaluate our tool by answering the following research question: How does EXPLANATOR2 scale with respect to the formula size when compared to other state-ofthe-art monitoring tools? To this end, we reuse the evaluation setup of the MTL monitor

HYDRA [26]. We consider two different settings: (i) past-only MTL formulas; and (ii) MTL formulas (mixing past and future operators). For each setting we pseudo-randomly generate a trace with 100 000 events and collections of fve different formulas for each size *<sup>s</sup>* ∈ {6,17,...,50} . We measure the time and space usage of the <sup>E</sup>XPLANATOR2, HYDRA and VYDRA [27], AERIAL [3] MONPOLY [5], and VERIMON [29]. Our verifed dynamic programming algorithm is not included because it times out (with a time-out of 200 seconds) even for the smallest formulas of size 6. The experiments were conducted on a computer with an AMD Ryzen 5 5600X CPU and 16GB of RAM. The results are presented in Figure 12. Each flled shape is an average of the measurements for the corresponding formula size. (Unflled shapes show the individual runs, but are sometimes

Time-wise, EXPLANATOR2 outperforms MONPOLY and VERIMON (frst-order monitors), and is on par with most of its competitors in the past-only setting. When we include future operators, EXPLANATOR2 performs worse than its competitors, although only by a narrow margin. However, we must consider that in contrast to the others our tool has a clear disadvantage: it produces checkable and understandable output instead of Boolean verdicts. Thus, these results reassure us that we do not compromise too much by providing this feature, and that our algorithm is indeed effcient. In terms of space usage, EXPLANATOR2 performs worse than other monitoring tools in both settings. This is hardly surprising, given that proofs can be huge (e.g., they may contain the entire trace).

invisible.) The axes showing time and space usage measurements are of logarithmic scale.

### 8 Conclusion

We have developed an online MTL monitor that outputs detailed verdicts in the form of proof trees, which serve as both understandable explanations and checkable certifcates. Our monitor incorporates a formally verifed checker and an interactive visualization. Our empirical evaluation demonstrates the reasonable performance of our monitor, even though it provides a strictly more informative output than its competitors. Overall, we believe that our approach signifcantly improves the user experience when using an MTL monitor. In particular, the generated explanations provide insight into root causes of violations and can help with specifcation debugging. Another plausible application of explanations is teaching temporal logics to students and engineers.

As future work, we will lift our approach to the more expressive metric frst-order temporal logic. The main challenge here is to incorporate parametric events and quantifcation. Moreover, we are interested in optimizing other aspects of the proofs than their size.

*Data Availability Statement* EXPLANATOR2 is available under the GNU Lesser General Public License v3.0 [22] and its interactive visualization is hosted on GitHub. Our artifact [23] contains the snapshot of the tool's source code at paper submission time along with instructions on how to run our test suite and to reproduce our evaluation.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **12th Competition on Software Verification — SV-COMP 2023**

## Competition on Software Verifcation and Witness Validation: SV-COMP 2023

### Dirk Beyer <sup>B</sup>

LMU Munich, Munich, Germany

Abstract. The 12th edition of the Competition on Software Verifcation (SV-COMP 2023) is again the largest overview of tools for software verifcation, evaluating 52 verifcation systems from 34 teams from 10 countries. Besides providing an overview of the state of the art in automatic software verifcation, the goal of the competition is to establish standards, provide a platform for exchange to developers of such tools, educate PhD students on reproducibility approaches and benchmarking, and provide computing resources to developers that do not have access to compute clusters. The competition consisted of 23 805 verifcation tasks for C programs and 586 verifcation tasks for Java programs. The specifcations include reachability, memory safety, overfows, and termination. This year, the competition introduced a new competition track on witness validation, where validators for verifcation witnesses are evaluated with respect to their quality.

Keywords: Formal Verifcation · Program Analysis · Competition · Software Verifcation · Verifcation Tasks · Benchmark · C Language · Java Language · SV-Benchmarks · BenchExec · CoVeriTeam

### 1 Introduction

This report extends the series of competition reports (see footnote) by describing the results of the 2023 edition, but also explaining the process and rules, giving insights into some aspects of the competition (this time the focus is on the added validation track). The 12th Competition on Software Verifcation (SV-COMP, https://sv-comp.sosy-lab.org/2023) is the largest comparative evaluation ever in this area. The objectives of the competitions were discussed earlier (1-4 [16]) and extended over the years (5-6 [17]):


© The Author(s) 2023

This report extends previous reports on SV-COMP [10, 11, 12, 13, 14, 15, 16, 17, 18, 20].

Reproduction packages are available on Zenodo (see Table 3).

<sup>B</sup> dirk.beyer@sosy-lab.org

https://doi.org/10.1007/978-3-031-30820-8\_29 S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 495–522, 2023.


The SV-COMP 2020 report [17] discusses the achievements of the SV-COMP competition so far with respect to these objectives.

Related Competitions. There are many competitions in the area of formal methods [9], because it is well-understood that competitions are a fair and accurate means to execute a comparative evaluation with involvement of the developing teams. We refer to a previous report [17] for a more detailed discussion and give here only the references to the most related competitions [22, 58, 67, 74].

Quick Summary of Changes. While we try to keep the setup of the competition stable, there are always improvements and developments. For the 2023 edition, the following changes were made:


### 2 Organization, Defnitions, Formats, and Rules

Procedure. The overall organization of the competition did not change in comparison to the earlier editions [10, 11, 12, 13, 14, 15, 16, 17, 18]. SV-COMP is an open competition (also known as comparative evaluation), where all verifcation tasks are known before the submission of the participating verifers, which is necessary due to the complexity of the C language. The procedure is partitioned into the benchmark submission phase, the training phase, and the evaluation phase. The participants received the results of their verifer continuously via e-mail (for preruns and the fnal competition run), and the results were publicly announced on the competition web site after the teams inspected them.

Competition Jury. Traditionally, the competition jury consists of the chair and one member of each participating team; the team-representing members circulate every year after the candidate-submission deadline. This committee reviews the competition contribution papers and helps the organizer with resolving any disputes that might occur (cf. competition report of SV-COMP 2013 [11]). The


Table 1: Scoring schema for SV-COMP 2023 (unchanged from 2021 [18])

Fig. 1: Visualization of the scoring schema for the reachability property (unchanged from 2021 [18])

tasks of the jury were described in more detail in the report of SV-COMP 2022 [20]. The team representatives of the competition jury are listed in Table 5.

Scoring Schema and Ranking. The scoring schema of SV-COMP 2023 was the same as for SV-COMP 2021. Table 1 provides an overview and Fig. 1 visually illustrates the score assignment for the reachability property as an example. As before, the rank of a verifer was decided based on the sum of points (normalized for meta categories). In case of a tie, the rank was decided based on success run time, which is the total CPU time over all verifcation tasks for which the verifer reported a correct verifcation result. Opt-out from Categories and Score Normalization for Meta Categories was done as described previously [11, page 597].

License Requirements. Starting 2018, SV-COMP required that the verifer must be publicly available for download and has a license that



Table 2: Publicly available components for reproducing SV-COMP 2023



Task-Defnition Format 2.0. SV-COMP 2023 used the task-defnition format in version 2.0. More details can be found in the report for Test-Comp 2021 [19].

Properties. Please see the 2015 competition report [13] for the defnition of the properties and the property format. All specifcations used in SV-COMP 2023 are available in the directory c/properties/ of the benchmark repository.

Categories. The (updated) category structure of SV-COMP 2023 is illustrated by Fig. 2. Category C-FalsifcationOverall contains all verifcation tasks of C-Overall without Termination and Java-Overall contains all Java verifcation tasks. Compared to SV-COMP 2022, we added one new sub-category ReachSafety-Hardware to main category ReachSafety, sub-categories ConcurrencySafety-MemSafety, ConcurrencySafety-NoOverfows, and ConcurrencySafety-NoDataRace-Main (was demo in 2022) to main category ConcurrencySafety, main category NoOverfows was restructured, and fnally we added SoftwareSystems-DeviceDriversLinux64-MemSafety to main category SoftwareSystems. The categories are also listed in Tables 8, 9, and 10, and described in detail on the competition web site (https://sv-comp.sosy-lab.org/2023/benchmarks.php).

Reproducibility. SV-COMP results must be reproducible, and consequently, all major components are maintained in public version-control repositories. The overview of the components is provided in Fig. 3, and the details are given in Table 2. We refer to the SV-COMP 2016 report [14] for a description of all components of the SV-COMP organization. There are competition artifacts at Zenodo (see Table 3) to guarantee their long-term availability and immutability.

Fig. 2: Category structure for SV-COMP 2023

Fig. 3: Benchmarking components of SV-COMP and competition's execution fow (same as for SV-COMP 2020)


Table 4: Validation: Witness validators and witness linter

Competition Workfow. The workfow of the competition is described in the report for Test-Comp 2021 [19] (SV-COMP and Test-Comp use a similar workfow). For a description of how to reproduce single verifcation runs and a trouble-shooting guide, we refer to the previous report [20, Sect. 3].

### 3 Participating Verifers and Validators

The participating verifcation systems are listed in Table 5. The table contains the verifer name (with hyperlink), references to papers that describe the systems, the representing jury member and the afliation. The listing is also available on the competition web site at https://sv-comp.sosy-lab.org/2023/systems.php. Table 6 lists the algorithms and techniques that are used by the verifcation tools, and Table 7 gives an overview of commonly used solver libraries and frameworks.

Validation of Verifcation Results. The validation of the verifcation results was done by eleven validation tools (ten proper witness validators, and one


Table 5: Verifcation: Participating verifers with tool references and representing jury members; new for frst-time participants, <sup>∅</sup> for hors-concours participation

(continues on next page)


Table 5: Competition candidates (continued)



(continues on next page)


Table 6: Algorithms and techniques (continued)



(continues on next page)


Table 7: Solver libraries and frameworks (continued)

witness linter for syntax checks), which are listed in Table 4, including references to literature. The ten witness validators are evaluated based on all verifcation witnesses that were produced in the verifcation track of the competition.

Hors-Concours Participation. As in previous years, we also included verifers to the evaluation that did not actively compete or that should not occur in the rankings for some reasons (e.g., meta verifers based on other competing tools, or tools for which the submitting teams were not sure if they show the full potential of the tool). These participations are called hors concours, as they cannot participate in rankings and cannot "win" the competition. Those verifers are marked as 'hors concours' in Table 5 and others, and the names are annotated with a symbol (<sup>∅</sup>).

### 4 Results of the Verifcation Track

The results of the competition represent the the state of the art of what can be achieved with fully automatic software-verifcation tools on the given benchmark set. We report the efectiveness (number of verifcation tasks that can be solved and correctness of the results, as accumulated in the score) and the efciency (resource consumption in terms of CPU time and CPU energy). The results are presented in the same way as in last years, such that the improvements compared

Table 8: Verifcation: Quantitative overview over all regular results;



Table 9: Verifcation: Quantitative overview over all hors-concours results; empty cells represent opt-outs, new for frst-time participants, <sup>∅</sup> for hors-concours participation

to the last years are easy to identify. The results presented in this report were inspected and approved by the participating teams.

Quantitative Results. Tables 8 and 9 present the quantitative overview of all tools and all categories. Due to the large number of tools, we need to split the presentation into two tables, one for the verifers that participate in the rankings (Table 8), and one for the hors-concours verifers (Table 9). The head row mentions the category, the maximal score for the category, and the number of verifcation tasks. The tools are listed in alphabetical order; every table row lists the scores of one verifer. We indicate the top three candidates by formatting their scores in bold face and in larger font size. An empty table cell means that the verifer opted-out from the respective main category (perhaps participating in subcategories only, restricting the evaluation to a specifc topic). More information (including interactive tables, quantile plots for every category, and also the raw data in XML format) is available on the competition web site (https://sv-comp.sosy-lab.org/2023/results) and in the results artifact (see Table 3).

Table 10: Verifcation: Overview of the top-three verifers for each category; new for frst-time participants, values for CPU time and energy rounded to two signifcant digits

Table 10 reports the top three verifers for each category. The run time (column 'CPU Time') and energy (column 'CPU Energy') refer to successfully solved verifcation tasks (column 'Solved Tasks'). We also report the number of tasks for which no witness validator was able to confrm the result (column 'Unconf. Tasks'). The columns 'False Alarms' and 'Wrong Proofs' report the number of verifcation

Fig. 4: Quantile functions for category C-Overall. Each quantile function illustrates the quantile (x-coordinate) of the scores obtained by correct verifcation runs below a certain run time (y-coordinate). More details were given previously [11]. A logarithmic scale is used for the time range from 1 s to 1000 s, and a linear scale is used for the time range between 0 s and 1 s.

tasks for which the verifer reported wrong results, i.e., reporting a counterexample when the property holds (incorrect False) and claiming that the program fulflls the property although it actually contains a bug (incorrect True), respectively.

Score-Based Quantile Functions for Quality Assessment. We use scorebased quantile functions [11, 34] because these visualizations make it easier to understand the results of the comparative evaluation. The results archive (see Table 3) and the web site (https://sv-comp.sosy-lab.org/2023/results) include such a plot for each (sub-)category. As an example, we show the plot for category C-Overall (all verifcation tasks) in Fig. 4. A total of 13 verifers participated in category C-Overall, for which the quantile plot shows the overall performance over all categories (scores for meta categories are normalized [11]). A more detailed discussion of score-based quantile plots, including examples of what insights one can obtain from the plots, is provided in previous competition reports [11, 14].

The winner of the competition, UAutomizer, achieves the best cumulative score (graph for UAutomizer has the longest width from x = 0 to its right end). Verifers whose graphs start with a negative cumulative score produced wrong results.

New Verifers. To acknowledge the verifcation systems that participate for the frst or second time in SV-COMP, Table 11 lists the new verifers (in SV-COMP 2022 or SV-COMP 2023). It is remarkable to see that frst-time participants can win or almost win large categories: Bubaaknew is the best verifer for category FalsifcationOverall, and Bubaaknew is the second-best and Mopsanew third-best in category SoftwareSystems. Figure 5 shows the growing interest in the competition over the years.

Fig. 5: Number of evaluated verifers for each year (frst-time participants on top)

Table 11: New verifers in SV-COMP 2022 and SV-COMP 2023; column 'Subcategories' gives the number of executed categories (including demo category NoDataRace), new for frst-time participants, <sup>∅</sup> for hors-concours participation


Computing Resources. The resource limits were the same as in the previous competitions [14], except for the upgraded operating system: Each verifcation run was limited to 8 processing units (cores), 15 GB of memory, and 15 min of CPU time. Witness validation was limited to 2 processing units, 7 GB of memory, and 1.5 min of CPU time for violation witnesses and 15 min of CPU time for correctness witnesses. The machines for running the experiments are part of a

Fig. 6: Scoring schema for evaluation of validators; p = −16 for SV-COMP 2023; fgure adopted from [37]

compute cluster that consists of 168 machines; each verifcation run was executed on an otherwise completely unloaded, dedicated machine, in order to achieve precise measurements. Each machine had one Intel Xeon E3-1230 v5 CPU, with 8 processing units each, a frequency of 3.4 GHz, 33 GB of RAM, and a GNU/Linux operating system (x86\_64-linux, Ubuntu 22.04 with Linux kernel 5.15). We used BenchExec [34] to measure and control computing resources (CPU time, memory, CPU energy) and VerifierCloud to distribute, install, run, and clean-up verifcation runs, and to collect the results. The values for time and energy are accumulated over all cores of the CPU. To measure the CPU energy, we used CPU Energy Meter [38] (integrated in BenchExec [34]).

One complete verifcation execution of the competition consisted of 490 858 verifcation runs in 91 run sets (each verifer on each verifcation task of the selected categories according to the opt-outs), consuming 1 114 days of CPU time and 299 kWh of CPU energy (without validation). Witness-based result validation required 4.59 million validation runs in 1 527 run sets (each validator on each verifcation task for categories with witness validation, and for each verifer), consuming 877 days of CPU time. Each tool was executed several times, in order to make sure no installation issues occur during the execution. Including these preruns, the infrastructure managed a total of 2.78 million verifcation runs in 560 run sets (verifer × property) consuming 13.8 years of CPU time, and 35.9 million validation runs in 11 532 run sets (validator × verifer × property) consuming 17.8 years of CPU time. This means that also the load of the experiment infrastructure increased and was larger than ever before.


Table 12: Validation of violation witnesses: Overview of the top-three verifers for each category; values for CPU time and energy rounded to two signifcant digits

### 5 Results of the Witness-Validation Track

The validation of verifcation results, in particular, verifcation witnesses, becomes more and more important for various reasons: verifcation witnesses justify and help to understand and interpret a verifcation result, they serve as exchange object for intermediate results, and they allow to make use of imprecise verifcation techniques (e.g., via machine learning). A case study on the quality of the results of witness validators [37] suggested that validators for verifcation results should also undergo a periodical comparative evaluation and proposed a scoring schema for witness-validation results. SV-COMP 2023 evaluated 10 validators on more than 100 000 verifcation witnesses.


Table 13: Validation of correctness witnesses: Overview of the top-three verifers for each category; values for CPU time and energy rounded to two signifcant digits

Scoring Schema for Validation Track. The score of a validator in a subcategory is computed as

$$score = \left(\frac{p\_{\text{correct}}}{|\text{correct}^\*|} + q \cdot \frac{p\_{\text{wrong}}}{|\text{wrong}|}\right) \cdot \frac{|\text{correct}^\*| + |\text{wrong}|}{2}$$

where the points in pcorrect<sup>∗</sup> and pwrong are determined according to the schema in Fig. 6 and then normalized using the normalization schema that SV-COMP uses for meta categories [11, page 597], except for the factor q, which gives a higher weight to wrong witnesses. Wrong witnesses are witnesses that do not agree with the expected verifcation verdict. Witnesses that agree with the expected verifcation verdict cannot be automatically treated as correct because we do not yet have an established way to determine this. Therefore, we call this class of witnesses

correct<sup>∗</sup> . Further details are given in the proposal [37]. This schema relates to each base category from the verifcation track a meta category that consists of two sub-categories, one with the correct<sup>∗</sup> and one with the wrong witnesses.

Tables 12 and 13 show the rankings of the validators. False alarms in Table 12 are claims of a validator that the program contains a bug described by a given violation witness although the program is correct (the validator confrms a wrong violation witness). Wrong proofs in Table 13 are claims of a validator that the program is correct according to invariants in a given correctness witness although the program contains a bug (the validator confrms a wrong correctness witness). The scoring schema signifcantly punishes results that confrm a wrong verifcation witness, as visible for validator MetaVal in Table 13.

Table 13 shows that there are categories that are supported by less than three validators ('missing validators'). This reveals a remarkable gap in softwareverifcation research:

There are verifcation results that cannot be independently confrmed, according to the state of the art in software verifcation.

### 6 Conclusion

The 12th edition of the Competition on Software Verifcation (SV-COMP 2023) again increased the number of participating systems and gave the largest ever overview over software-verifcation tools, with 52 participating verifcation systems (incl. 9 new verifers and 18 hors-concours; see Fig. 5 for the participation numbers and Table 5 for the details). For the frst time, a thorough comparative evaluation of 10 validation tools was performed; the validation tools were assessed in a similar manner as in the verifcation track, using a community-agreed scoring schema [37] which is derived from the scoring schema of the verifcation track. The number of verifcation tasks in SV-COMP 2023 was signifcantly increased to 23 805 in the C category. The high quality standards of the TACAS conference are ensured by a competition jury, with a member from each actively participating team. We hope that the broad overview of verifcation tools stimulates the further advancements of software verifcation, and in particular, the validation track showed some open problems that should be addressed.

Data-Availability Statement. The verifcation tasks and results of the competition are published at Zenodo, as described in Table 3. All components and data that are necessary for reproducing the competition are available in public version repositories, as specifed in Table 2. For easy access, the results are presented also online on the competition web site https://sv-comp.sosy-lab.org/2023/results. The main results were reproduced in an independent reproduction study [66].

Funding Statement. This project was funded in part by the Deutsche Forschungsgemeinschaft (DFG) — 418257054 (Coop).

Acknowledgements. We thank Marcus Gerhold and Arnd Hartmanns for their reproduction study [66] on SV-COMP 2023.

### References


Open Access. This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Symbiotic-Witch 2: More Efficient Algorithm and Witness Refutation? (Competition Contribution)

Paulína Ayaziová and Jan Strejček ()

Masaryk University, Brno, Czech Republic {xayaziov,strejcek}@fi.muni.cz

Abstract. The new version of the witness validator Symbiotic-Witch follows more precisely the (fixed version of the) semantics of verification witnesses. This makes the tool more efficient as it can benefit from sink nodes. Further, the tool can now refute a witness. To sum up, Symbiotic-Witch 2 can confirm or refute violation witnesses of reachability safety, memory safety, memory cleanup, and overflow properties of sequential C programs.

### 1 Witness Validation Approach

The basic principle of the witness validator Symbiotic-Witch 2 remains the same as in the previous version of the tool [1], i.e., it symbolically executes [9] the given program along execution paths specified by the corresponding witness. The substantial differences were induced by a more precise interpretation of violation witnesses and by the commmunity decision to support witness refutation.

We originally thought that every node of a witness automaton has an implicit self-loop that can be taken under each program instruction. After SV-COMP 2022, we learnt that the implicit self-loop of a node q can be used only by edges of control flow automata (CFA) that are "either


This definition is problematic in particular because it refers to CFA and there is no standardized translation of C programs to CFA. Especially the case (b) heavily depends on the granularity of constructed CFA as it refers to adjacent edges. As the semantics of verification witnesses has to be unambiguous, we have convinced the community that the case (b) should be removed from the semantics. Still, the case (a) is viable and it considerably reduces the applicability of implicit self-loops.

<sup>?</sup> This work has been supported by the Czech Science Foundation grant GA23-06506S.

Symbiotic-Witch 2 works as follows. It reads a given violation witness and the corresponding program. The program is symbolically executed and every state of symbolic execution is accompanied by the set of witness automaton nodes that are reached by the executed program path. Note that these sets are dramatically smaller than in the previous version of our tool due to the more precise semantics of implicit self-loops. If the set does not contain any node except sink nodes, the symbolic execution of the corresponding path is stopped. This brings a significant speed up compared to the previous version of our tool where this situation cannot happen.

Another significant difference to the previous version is the handling of statespace guards of a given witness. Consider a symbolic execution state and the associated set of witness automata nodes. Further, assume that the next instruction processed by the symbolic execution matches the source-code guards of some automata edges leading from the set of nodes. For each state-space guard of these edges, we create a fork of symbolic execution and restrict the next symbolic execution state to satisfy the state-space guard. The set of nodes accompanying the restricted symbolic execution state contains only target nodes of the edges with the enforced state-space guard. Note that the previous version of our validator ignores state-space guards unless the witness automaton contains a single path from the entry node to the violation node.

If the symbolic execution detects a violation of the considered property and the tracked set of witness automata nodes contains a violation node, the witness is confirmed. The witness is refuted if


The witness automata use various attributes to specify source-code guards (saying which instructions correspond to a given witness automaton edge) and state-space guards (restrictions on program states). Symbiotic-Witch 2 supports only selected attributes for source-code guards, namely the line number of executed instructions, the information whether true or false branch is taken, and the information about entering a function or returning from a function. Regarding the state-space guard, our tool uses only the return values of the \_\_VERIFIER\_nondet\_\* functions. The limited support of attributes means that our tool can misinterpret a given witness automaton, i.e., it can consider some execution path to be represented by the automaton even if it is not, and vice versa. In practice, this is not a big issue as many verification tools produce violation witnesses with only the supported attributes and some other tools use unsupported attributes to provide additional information (like offset of an instruction in the source code) that typically do not change the represented set of execution paths.

#### 2 Software Architecture

The tool Symbiotic-Witch 2 is integrated to the Symbiotic framework [7] and it can be roughly divided into two components. The first component is a set of python scripts (many of them shared with other Symbiotic tools) that preprocess the code. More precisely, they set the options for optimisations and Clang sanitizer depending on the considered property, translates the given C program into llvm intermediate representation via Clang, and links necessary function definitions.

The second component called Witch-Klee takes the preprocessed program and the witness, and it runs the actual witness validation. Witch-Klee is derived from the symbolic executor JetKlee, which is a fork of Klee [6] used in the Symbiotic framework. Witch-Klee employs RapidXML for parsing witnesses in the GraphML format [5] and Z3 [10] as the SMT solver in symbolic execution.

Both components of Symbiotic-Witch 2 run on llvm 10.0.1.

#### 3 Strengths and Weaknesses

On the positive side, Symbiotic-Witch 2 can efficiently handle violation witnesses providing return values of \_\_VERIFIER\_nondet\_\* functions as well as those describing execution paths by taken branches.

Further, if Symbiotic-Witch 2 confirms a witness containing only attributes supported by the tool, then the witness is indeed valid. If Symbiotic-Witch 2 confirms a witness with some attributes not supported by the tool, then the program really violates the considered property and this violation can, but does not have to be represented by the witness. If Symbiotic-Witch 2 refutes a witness, then this witness is indeed invalid. The only exception is the case when the program contains some inner nondeterminism that is lost by the translation to llvm. For example, consider a program that contains a test f(x) < g(x). Due to the C standard, the functions f(x) and g(x) can be evaluated in any order. If a violation witness prescribes one order of evaluation and Clang translates the program such that the functions are evaluated in the opposite order, then the witness can be refuted even if it is correct. We can construct such a witness, but we have not yet come across any of these in practice. We plan to extend our tool with a check for this kind of inner nondeterminism in order to guarantee the correctness of refutation answers.

Our tool also has some weaknesses. Some of them come from the fact that we do not support all possible attributes of witnesses. We decided not to invest more effort to support other attributes as we expect the witness format to be revised soon due to detected issues in its semantics. In spite of this, the tool correctly confirmed 35536 and refuted 3108 violation witnesses of SV-COMP 2023. On the negative side, the tool also confirmed 10 witnesses of memory safety violation marked as invalid. Nine of these incorrect validation results stem from two verification tasks where our symbolic executor reported a valid-memtrack violation while the tasks are marked true for this property.

Symbiotic-Witch 2 struggles to evaluate two specific classes of witnesses. The first class are the witnesses for the programs in the ECA subcategory. These generated artificial programs are hard to compile and optimize. Thus, our tool sometimes runs out of time during the code preprocessing phase.

The second class are the witnesses that contain edges describing declarations and initializations of global variables (e.g., some witnesses produced by Ultimate Automizer [8]). Our algorithm processes these declarations and initializations in a separate step and starts the symbolic execution of a given program (and thus also the witness tracking) in the function main. This means that the witness tracking cannot pass any witness edge representing instructions that are not reachable from main. Hence, Symbiotic-Witch 2 can refute some witnesses of the second class even if it finds the property violations they represent. This issue can be seen as another consequence of the fact that the semantics of witnesses is formulated over CFA and the translation of C programs to CFA is not given.

### 4 Tool Setup and Configuration

The archive with Symbiotic-Witch 2 is available in the SV-COMP archives. To run the validator, use the command

./symbiotic [--prp <prop>] [--32 | --64] --witness-check <witness> <prg>

where <witness> is a violation witness in the GraphML format, <prg> is the corresponding C program, and <prop> is the considered property. The property can be supplied as a .prp file or one of the following shortcuts: no-overflow, valid-memsafety, or valid-memcleanup. The default property is unreachability of the function reach\_error(). The switches --32 and --64 specify the considered architecture, 64-bit being the default.

Both components of the tool are also available on GitHub with build instructions in the respective README.md files. To start validation, build each component separately, add the path to the built witch-klee executable to \$PATH and run Symbiotic as previously described.

### 5 Software Project and Contributors

Symbiotic-Witch 2 has been developed at Faculty of Informatics, Masaryk University by Paulína Ayaziová under the guidance of Jan Strejček. The tool is available under the MIT license and all used tools and libraries (llvm, Klee, Z3, RapidXML, Symbiotic) are also available under open-source licenses that comply with SV-COMP's policy for the reproduction of results. The source code of Witch-Klee (the competing version tagged SV-COMP23) can be found at:

#### https://github.com/ayazip/witch-klee

The source code of the respective version of Symbiotic is available at:

https://github.com/staticafi/symbiotic/tree/witch-klee

Data Availability Statement. All data of SV-COMP 2023 are archived as described in the competition report [3] and available on the competition web site. This includes the verification tasks, results, witnesses, scripts, and instructions for reproduction. The version of Symbiotic-Witch 2 used in the competition is archived together with other participating tools [4] or separately [2].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## 2LS: Arrays and Loop Unwinding (Competition Contribution)

Viktor Mal´ık<sup>3</sup>?() , Frantisek Ne ˇ cas ˇ 3 , Peter Schrammel<sup>1</sup>,<sup>2</sup> , and Toma´s Vojnar ˇ 3

Diffblue Ltd., Oxford, UK University of Sussex, Sussex, UK Brno University of Technology, FIT, Brno, Czech Republic ?? imalik@fit.vut.cz

Abstract 2LS is a C program analyser built upon the CPROVER infrastructure that can verify and refute program assertions, memory safety, and termination. Until now, one of the main drawbacks of 2LS was its inability to verify most programs with arrays. This paper introduces a new abstract domain in 2LS for reasoning about the contents of arrays. In addition, we introduce an improved approach to loop unwinding, a crucial component of the 2LS' verification algorithm, which particularly enables finding proofs and counterexamples for programs working with dynamic memory.

### 1 Overview

2LS is a static analysis and verification tool for sequential C programs. At its core, it uses the kIkI algorithm (k-invariants and k-induction) [2], which integrates bounded model checking, k-induction, and abstract interpretation into a single, scalable framework. kIkI relies on incremental SAT solving in order to find proofs and refutations of assertions, as well as to perform (non)termination analysis [3].

One of the core mechanisms of kIkI is incremental loop unwinding. However, the original unwinding approach that 2LS used was not compatible with the memory model developed in [6]. Hence, in the first part of this paper, we introduce a new approach to loop unwinding [9] that supports programs manipulating dynamic memory and hence allows 2LS to verify programs that could not be handled before.

The abstract interpretation part of kIkI features multiple abstract domains for reasoning about various data structures in programs. In particular, the competition version of 2LS uses the interval domain for numerical values and our custom heap domain for describing the shape of the heap. A common data structure that 2LS could not handle in the past are arrays. Therefore, in the second part of this paper, we introduce a new array abstract domain capable of reasoning about the content of arrays.

Architecture. The architecture of 2LS has been described in previous competition contributions [10,7,8]. In brief, 2LS is built upon the CPROVER infrastructure [4] and thus uses *GOTO programs* as the internal program representation. The analysed program is first translated into a single static assignment (SSA) form. Then, inductive invariants in various abstract domains are computed for the program's loops. Last, the SSA form and the invariants are bit-blasted into a propositional formula and given to a SAT solver which is used to reason about the program's properties.

 c The Author(s) 2023 S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 529–534, 2023. https://doi.org/10.1007/978-3-031-30820-8 31

<sup>?</sup> Jury member

<sup>??</sup> The Czech authors were supported by the Czech Science Foundation project 23-06506S, the FIT BUT project FIT-S-23-8151, and the Horizon Europe project CHESS (id 101087529).

Software Project. 2LS is implemented in C++ and it is maintained by Peter Schrammel and Viktor Mal´ık with contributions by the community. The competition version uses Glucose 4.0 as its back-end SAT solver. 2LS competes in all C categories except Concurrency. See the previous competition report [8] for details on executing 2LS.

### 2 Loop Unwinding of Heap-Manipulating Programs

Whenever the kIkI algorithm is not able to verify or refute the program's properties for the given unwinding level, it incrementally unwinds the loops in order to compute a stronger invariant or to explore additional reachable program states [2]. 2LS' original unwinder unrolls the loops directly at the level of the program's SSA form. However, this approach is not compatible with the encoding of pointer operations that 2LS uses [6]. Hence, for this year's competition version of 2LS, we introduce a new approach to loop unwinding which overcomes these limitations and allows to verify heap-manipulating programs using k-induction and BMC.

Memory model in 2LS. Each call of malloc is replaced by a finite number of socalled *abstract dynamic objects* that over-approximate the (possibly unbounded) set of concrete dynamic objects allocated by that call. Subsequently, the conversion of pointerdereferencing operations to the SSA form is based on a static *points-to* analysis which computes for each pointer p the set of memory objects that p can be dereferenced into. Reads and writes to memory through p are then encoded using a case-split of objects which p can point to in the program location of the given memory operation [6].

The points-to analysis is performed on the *GOTO program* (control-flow graph) prior to generating the SSA form. This approach poses a problem for the original unwinder when dealing with allocations inside loops. Each new unwinding of a loop may introduce a new call to malloc, effectively introducing new abstract dynamic objects. Such additions invalidate the previously computed points-to analysis since pointers may now also point to the new objects and, thus, operations via pointers must be re-encoded.

Unwinding in the GOTO programs. Our new approach to loop unwinding unrolls the loops in the *GOTO program* representation instead of the SSA form. This allows us to update the set of abstract dynamic objects in the program as well as to compute the points-to analysis anew based on the newly introduced objects [9]. In order to facilitate verification in 2LS, there are multiple transformations that need to be done after the loops of the *GOTO program* are unwound. First, the k-induction algorithm of 2LS requires a special unwinding approach. Many state-of-the-art unwinders, including the unwinder from CPROVER that we use, copy the loop body and place it before the original loop (i.e., the unwound loop bodies are outside the loop). On the contrary, 2LS requires all of the unwindings to be included in a single loop, i.e., the backwards edge of the not-yet-unwound part must go to the beginning of the topmost unwinding (instead of going to the top of the not-yet-unwound part) [2]. Hence, we must appropriately reconnect the backwards edges to fulfil this requirement and make our approach usable with the current algorithms of 2LS. Second, assertions inside the unwound loop bodies may be assumed to hold as they were verified in the previous iteration of the kIkI algorithm. Hence, 2LS converts such assertions into assumptions. We reflect this approach inside our new unwinding algorithm, cf. [9] for details.

Combining the two approaches. The proposed approach, while being sound when handling dynamic memory, introduces a noticeable performance degradation. Unwinding of loops in the *GOTO program* changes a great part of the generated SSA form which decreases the benefits of incremental SAT solving. To overcome this issue, we only enable the new unwinder when necessary, i.e., when dynamic memory is used in the analysed program. In addition, in our future work, we plan to improve our new unwinder to fully leverage incremental solving.

### 3 Array Domain

The core algorithm of 2LS, kIkI, uses abstract interpretation to infer k-inductive invariants in various abstract domains. The computed invariants are used to verify or refute the program's properties. Since the verification approach of 2LS is based on translating the program into a first-order formula to reason about its properties, the abstract domains in 2LS are required to have the form of a *template*—a parametrised, quantifierfree, first-order formula describing a relevant program property. 2LS already supports a handful of domains, such as the interval domain [2], a shape domain [6], or ranking domains [3] for termination analysis, however, a domain for describing the content of arrays has been missing, which limited usability of 2LS on programs manipulating array structures. In this section, we propose such a domain.

In the literature, there exists a number of works on abstract domains for arrays. To exploit the 2LS' seamless combination of abstract domains, we found that perhaps the most suitable approach to draw inspiration from is [5], where each array is split into several parts, called *segments*, and a separate invariant is computed for every segment. The segment invariant can be computed in any domain supported by 2LS, usually selected based on the data type of the array elements (e.g., the interval domain for numerical values or the shape domain for pointers). In the rest of this section, we describe different aspects of our proposed domain. In all of the below parts, we assume that we compute a loop invariant of an array a. We use N<sup>a</sup> to denote the number of elements of a.

Array Segmentation. First, let us assume that we know the set of array indices, so-called *segment borders*, for an array a which we denote B<sup>a</sup> (see below on the way this set is obtained). When splitting a into segments, we distinguish two situations:

1. Indices from B<sup>a</sup> cannot be totally ordered. In such a case, we create multiple segmentations, one for each b ∈ Ba:

$$\{0\} \ S\_1^b \left\{ b \right\} \ S\_2^b \left\{ b+1 \right\} \ S\_3^b \left\{ N\_a \right\}. \tag{1}$$

2. Indices from B<sup>a</sup> can be totally ordered s.t. b<sup>1</sup> ≤ · · · ≤ bn. In such a case, we create a single segmentation for the entire a:

$$\{0\} \ S\_1 \ \{b\_1\} \ S\_2 \ \{b\_1+1\} \ S\_3 \ \{b\_2\} \ \dots \ \{b\_n\} \ S\_{2n} \ \{b\_n+1\} \ S\_{2n+1} \ \{N\_a\}. \tag{2}$$

A single array segment S denoted {bl} S {bu} represents an abstraction of the elements of a between the indices b<sup>l</sup> (inclusive) and b<sup>u</sup> (exclusive). For each S, we define two special variables: (1) the *segment element variable* elem<sup>S</sup> being an abstraction of the array elements contained in S and (2) the *segment index variable* idx <sup>S</sup> being an abstraction of the indices of the array elements contained in S.

Array Template. Having the set of program arrays Arr and the set of segments S a for each a ∈ Arr, we define the array domain template as:

$$\mathcal{T}^A \equiv \bigwedge\_{a \in Arr} \bigwedge\_{S \in S^a} \left( G^S \Rightarrow \mathcal{T}^{in}(elem^S) \right) \tag{3}$$

where T in is the inner domain template (over the inner elements of S abstracted by elem<sup>S</sup> ) and G<sup>S</sup> is the conjunction of guards associated with the segment S. The purpose of G<sup>S</sup> is to make sure that the inner invariant is limited to the elements of the given segment {bl} S {bu}. In particular, G<sup>S</sup> is a conjunction of several guards:

$$a\_l \le idx^S < b\_u \land 0 \le idx^S < N\_a \land elem^S = a[idx^S] \tag{4}$$

where the first conjunct ensures that the segment index variable stays between the segment borders, the second conjunct makes sure that the segment index variable stays between the array borders (since segment borders are generic expressions, they may lie outside of the array), and the last conjunct binds the segment element variable to the segment index variable. Using the above template, 2LS is able to compute a different invariant for each segment. For example, for a typical array iteration loop, this would allow 2LS to infer a different invariant for the part of the array that has already been traversed than for the part of the array that is still to be visited.

Computing Array Segment Borders. Since 2LS requires the template formula to be fixed at the beginning of the analysis, the set of segments must be pre-computed. The main idea of our approach is that the segment borders should be closely related to the expressions that are used to access array elements in the analysed program. Therefore, we perform a static *array index analysis* which collects the set of all expressions occurring as array access indices (i.e., inside the square bracket operators). Once the analysis is complete, for each array a, we determine the set of its segment borders by taking the set of all index expressions used to write into a in the corresponding loop.

### 4 Strengths and Weaknesses

For general strengths and weaknesses of 2LS, we refer to the previous competition contribution [8]. The two major improvements described in the previous sections, increase the number of programs correctly verified by this year's version of 2LS. The new loop unwinding approach allows us to use the BMC part of the kIkI algorithm for programs manipulating dynamic memory, which particularly enables us to find counterexamples occurring in higher loop iterations, as well as verify such programs for which the initially computed invariant is not sufficiently strong and the loops can be unwound completely. This is the most notable in the heap-related categories (*MemSafety-Heap*, *MemSafety-LinkedLists*, and *ReachSafety-Heap*) where the number of the *correct true* and the *correct false* results increased from 110 to 177 and from 51 to 82, respectively. The new array domain allowed us to score points in array-related categories, which was not possible before (e.g., 2LS correctly solved 17 tasks in *ReachSafety-Arrays* compared to 2 from the previous years, which 2LS managed by chance)<sup>1</sup> .

Still, there remains a number of limitations. The array domain is rather simple and cannot verify many array-manipulating programs. In addition, as we described earlier, the new unwinder cannot make use of incremental SAT solving efficiently.

### 5 Data-Availablitity Statement

2LS is publicly available from https://www.github.com/diffblue/2ls, under a BSD-style license. The competition version is based on version 0.9.6 and the archive used in the competition is available from https://doi.org/10.5281/zenodo.7467706 or from the collection of all verifiers and validators participating in SV-COMP 2023 [1].

#### References


<sup>1</sup> A number of tasks was last-minute disqualified from SV-COMP 2023 due to past-deadline changes which were often related to the tasks being added to new categories (e.g., *NoOverflows*) rather than actual modifications of the tasks or their verdicts. Hence, we present results from the entire benchmark instead of the (limited) competition benchmark set as those results are more representative and can be better compared to the previous year's results.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Bubaak: Runtime Monitoring of Program Verifiers? (Competition Contribution)

Marek Chalupa() ?? and Thomas A. Henzinger

Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria mchalupa@ista.ac.at

Abstract. The main idea behind Bubaak is to run multiple program analyses in parallel and use runtime monitoring and enforcement to observe and control their progress in real time. The analyses send information about (un)explored states of the program and discovered invariants to a monitor. The monitor processes the received data and can force an analysis to stop the search of certain program parts (which have already been analyzed by other analyses), or to make it utilize a program invariant found by another analysis.

At SV-COMP 2023, the implementation of data exchange between the monitor and the analyses was not yet completed, which is why Bubaak only ran several analyses in parallel, without any coordination. Still, Bubaak won the meta-category FalsificationOverall and placed very well in several other (sub)-categories of the competition.

### 1 Verification Approach

Runtime monitoring (RM) [1] is a lightweight approach to observing the executions of software systems and analyzing their behavior. The system, for simplicity take a single program, is executed and observed to obtain a trace of events. The observed events carry information about (a subset of) actions that have been performed by the program like accesses to memory, calls of functions, or writing a text to the standard output. The trace is analyzed by the monitor that outputs verdicts, be it verdicts about some correctness property of the program or, e.g., information about resource consumption. Runtime enforcement [12] goes a step further and allows the monitor to alter the behavior of the program upon seeing some event or detecting a certain (usually faulty) behavior of the program.

RM is traditionally applied as a complementary method to static analysis to find bugs in computer programs. In Bubaak, we use RM to do monitoring and enforcement of the verifiers instead of the analyzed program itself. The verifiers are manually modified to emit events about their internal actions, for example, that they have reached some part of the analyzed code or that they have discovered an invariant. The monitor gathers and analyzes these events and can decide to command a verifier to stop a search of some parts of a program or to take into account an invariant found by another verifier.

? This work was supported by the ERC-2020-AdG 10102009 grant.

?? Jury member

### 2 Bubaak at SV-COMP 2023

At SV-COMP 2023 [2], the verifiers that we used are based on forward and backward symbolic execution.

(Forward) symbolic execution (SE) [14] is well-known for being efficient in searching for bugs. It aims to explore every feasible execution path of the analyzed program by building the so-called symbolic execution tree. Such an approach must fail if the SE tree is infinite or very large, in which case we talk about the path explosion problem. There are ways how to prune the SE tree from paths that are known to exclude buggy behavior, e.g., using interpolation [13].

Backward symbolic execution (BSE) [11] is a form of SE that searches the program backwards from error locations towards the initial locations. It has been shown [11] that BSE is equivalent to k-induction [16], another popular but incomplete verification technique. The incompleteness of BSE (k-induction) is caused by the lack of information about reachable states. This deficiency can be tackled by providing (often trivial) invariants that supplement the missing information [5]. These invariants can be computed externally before running BSE, or they can be computed on the fly [5,4,11]. One of the on-the-fly methods is loop folding and the resulting technique is called BSELF [11].

SE and BSE(LF) are well suited for analyzing safety properties, but are not suited for analyzing the termination of programs. To analyse this property, we have developed a new algorithm that has not been published yet and that we dubbed TIIP: termination with inductive invariants with progress. This algorithm runs SE, searching for non-terminating executions by remembering and comparing program states visited at loop headers. At the same time, it tries to incrementally (using a procedure similar to loop folding) compute an inductive invariant with progress for each visited loop. This invariant, if found, gives a pre-condition for the loop termination.

At SV-COMP 2023, we run in parallel two SE instances and one BSELF instance when checking properties unreach-call and no-overflow, SE and TIIP when checking termination, and just SE for memory safety properties. Using multiple SE instances at the same time makes sense because we use different verifiers (see Section 3) and their SE implementations support different features.

Because all the algorithms that we use are based on symbolic execution, the enforcement done by the monitor would effectively do a pruning of SE and BSE trees. Unfortunately, we have not managed to sufficiently debug this pruning and therefore it was disabled in the competition. As a result, Bubaak at SV-COMP 2023 only runs analyses in parallel without any coordination.

#### 3 Software Architecture

The high-level scheme of Bubaak for SV-COMP 2023 is shown in Figure 1. Bubaak takes as input C files and the property file. Internally, it compiles and links the input files into a single llvm bitcode file [7] which is also instrumented using UBSan sanitizer [18] if the checked property is no-overflow. Then, verifiers are spawned according to the given property. All verifiers run in parallel

Fig. 1. The setup of Bubaak at SV-COMP 2023. The colors indicate the properties that were checked by the different tools and algorithms.

(when there is more of them). At SV-COMP 2023, we used Slowbeast for SE, BSELF, and TIIP, and BubaaK-LEE as another instance of SE<sup>1</sup> .

Slowbeast [17] is a symbolic executor written in Python. It supports checking properties unreach-call and no-verflow with SE, BSE, and BSELF, and termination with TIIP. The tool has no or only a very limited support for properties no-data-race, valid-memsafety, and valid-memcleanup.

BubaaK-LEE is a fork of symbolic executor Klee [9] which is implemented in C++ and the current version is a merge of the upstream Klee and JetKLEE (the fork of Klee used in the tool Symbiotic [10]) with additional modifications. These modifications mostly concern modeling standard C functions but include also partial support for 128-bit wide integers and support for global variables with external linkage. BubaaK-LEE implements SE without any SE tree pruning and can check for all SV-COMP properties except for no-data-race.

Both symbolic executors use Z3 [15] as the SMT solver. The features they support differ significantly, though. For example, Slowbeast supports, apart from BSE(LF) and TIIP, symbolic floating-point computations, threaded programs, and incremental solving, while it does not support symbolic pointers and addresses which are features supported by BubaaK-LEE.

The monitor is currently a part of the control scripts written in Python and at SV-COMP 2023 it monitors only the standard (error) output of the tools as monitoring anything else is redundant until the implementation of data exchange between verifiers and the monitor is finished. The only enforcement that it does at SV-COMP 2023 is terminating the analysis entirely.

Differences to Symbiotic The tool Symbiotic [10] also uses Slowbeast and a fork of Klee, and therefore a discussion on differences between Bubaak and Symbiotic is in place. The version of Slowbeast used in Symbiotic is outdated while Bubaak uses the most up-to-date version (at the time of writing the paper) where a substantial part of the code has been rewritten and that contains new features including the implementation of TIIP. The relation between BubaaK-LEE and JetKLEE is mentioned earlier in this section.

Other differences between Bubaak and Symbiotic exist: Bubaak does not use any pre-analyses, slicing, and instrumentation (apart from the instrumenta-

<sup>1</sup> Because these verifiers do not compete at SV-COMP 2023 on their own, this does not make Bubaak a meta-verifier.


Table 1. Number of benchmarks decided by individual verifiers per property.

tion by UBSan for the property no-overflow, but there Symbiotic uses its own instrumentation), and it runs the verifiers in parallel, while Symbiotic uses a sequential composition [10].

### 4 Strengths and Weaknesses

The combination of SE and BSELF has been previously shown to be promising [11] because SE can quickly analyse many programs and BSELF then solves hard safe instances were SE found no bug or was unable to enumerate all paths. Running TIIP in parallel with pure SE has similar advantages. Still, all of SE, BSELF, and TIIP can be computationally very demanding as the number of executions they must search may be enormous and/or their exploration may involve lots of non-trivial queries to the SMT solver.

Running multiple verifiers in parallel reduces the wall-time while eating CPU time rapidly, which may be a disadvantage in SV-COMP. A remedy for this should be finishing the data exchange support between verifiers, which will allow to avoid burning CPU time on duplicate tasks.

### 5 Results of Bubaak at SV-COMP 2023

The results of Bubaak were highly influenced by bugs in the implementation. The tool had 41 wrong answers, 31 of these caused by a mistake in parsing of the output of BubaaK-LEE (25 for the property valid-memcleanup and 6 for the property termination). The rest of wrong answers (10) were caused by miscellaneous bugs. After normalizing scores, these 41 wrong answers resulted in loosing almost 10000 points in the overall score.

Also, BSELF did not decide a single benchmark because of a mistake in command line arguments when invoking it. Therefore, running Slowbeast was useful mainly in the category Termination where TIIP was able to solve roughly half of the decided benchmarks (in the rest of cases, BubaaK-LEE successfully enumerated all execution paths). The numbers of decided benchmarks are summarized in Table 1.

Overall, Bubaak won the category Falsification-Overall which confirms that SE is very good in finding bugs. The tool also scored silver in the category SoftwareSystems where it was also the leading tool in several sub-categories.

Data Availability Statement. The version of Bubaak that competed at SV-COMP 2023 is available at Zenodo [3,6]. The source code of Bubaak is available at github [8].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## EBF 4.2: Black-Box Cooperative Verification for Concurrent Programs (Competition Contribution)

Fatimah Aljaafari <sup>1</sup>,2() , Fedor Shmarov <sup>1</sup> , Edoardo Manino <sup>1</sup> , Rafael Menezes <sup>1</sup> , and Lucas C. Cordeiro <sup>1</sup>

<sup>1</sup> Department of Computer Science, The University of Manchester, Manchester M13 9PL, UK fatimahaljaafari@gmail.com

<sup>2</sup> Department of Computer Networks & Communications, CCSIT, King Faisal University, Al Hassa 31982, SA

Abstract. Combining different verification and testing techniques together could, at least in theory, achieve better results than each individual one on its own. The challenge in doing so is how to take advantage of the strengths of each technique while compensating for their weaknesses. *EBF* 4.2 addresses this challenge for concurrency vulnerabilities by creating Ensembles of Bounded model checkers and gray-box Fuzzers. In contrast with portfolios, which simply run all possible techniques in parallel, *EBF* strives to obtain closer cooperation between them. This goal is achieved in a black-box fashion. On the one hand, the model checkers are forced to provide seeds to the fuzzers by injecting additional vulnerabilities in the program under test. On the other hand, off-the-shelf fuzzers are forced to explore different interleavings by adding lightweight instrumentation and systematically re-seeding them.

### 1 Overview

Finding vulnerabilities in concurrent programs presents the combined challenge of exploring the search space of program inputs and execution schedules, or *interleavings*. Recently, there have been attempts at solving complex verification problems by combining different techniques into hybrid verification tools [1,2,3].

More generally, these attempts belong to a larger trend in automated software analysis called *cooperative verification* [4,5]. In this paradigm, the main idea is implementing some form of communication interface between different tools (i.e., a common information exchange format), which allows the exchange of partial results (artifacts). In this way, we can harness the strengths of multiple verification techniques and solve more complex problems [6,7,8].

In *EBF* [9], we are the first to implement a cooperative approach that combines Bounded Model Checking (BMC) and concurrency-aware Gray-Box Fuzzing (GBF) for finding vulnerabilities in concurrent C programs. In order to simplify the communication interface between the cooperating tools, we adopt a *black-box* design philosophy where verification artifacts are implicitly shared via appropriate transformation and instrumentation of the program under test (PUT). The advantage of this design philosophy is its universality: in fact, *EBF* can incorporate any BMC or GBF tool that takes a C program as input.

Fig. 1: The workflow of *EBF 4.2* comprises four stages (dashed rectangles). The safety proving and seed generation stages use a BMC tool. The falsification stage uses our *OpenGBF* tool. The result aggregation stage generates a verification verdict and counter-example (if any). Areas of improvement over *EBF 4.0* [9] are shown in blue.

More specifically, *EBF 4.2* expands the cooperative verification capabilities of previous versions of *EBF*. First, we introduce a new seed generation module for the GBF. This module works by injecting additional vulnerabilities in critical areas of the PUT, and then using a BMC engine to generate program inputs that trigger them. These inputs represent higher quality seeds for the fuzzer than randomly-generated ones. Second, we propose an improved light-weight instrumentation based on the Clang/LLVM toolchain that turns any compatible off-the-shelf GBF into a concurrency-aware fuzzer. We do so by injecting fuzzer-controlled delays in the PUT, which implicitly force the exploration of different interleavings.

### 2 Architecture

Figure 1 illustrates the workflow of *EBF*, which comprises four verification stages: safety proving, seed generation, falsification and results aggregation. Each of these stages take a concurrent C program and a given safety property as an input.

Safety Proving Stage. During this stage, *EBF* calls the BMC engine with the given inputs. The BMC tool produces one of the three possible *verdicts*: *Safe* if the model checker deems the PUT safe with respect to the given property, *Bug* if a vulnerability is detected, or *Unknown* encompassing a variety of different outcomes including reaching a timeout, running out of memory, or crashing unexpectedly. If the BMC tool finds a bug, it generates a counter-example – a sequence of program inputs and a thread schedule leading to the vulnerability. The input values are stored for later use as a seed.

Seed Generation Stage. This is a new feature of *EBF 4.2*, which harnesses the strength of BMC in resolving complex path conditions. For instance, the branch if(x\*x -2\*x +1 == 0) may be extremely difficult for the fuzzer to explore. *EBF* tackles this issue by repeatedly injecting the error statement assert(0) in each conditional branch of the PUT (similar to the approach in [2]). Then, each transformed program (which contains one unique error statement) is independently verified with the BMC tool. If the BMC reaches the error within a timeout, *EBF* converts the resulting counter-example into a fuzzing seed. The seed generation process continues until all injected errors have been detected or the stage timeout has been reached. The seeds we collect during this stage greatly improve the fuzzer performance in the next stage.

Falsification Stage. During this stage, *EBF* checks whether the PUT contains any vulnerabilities by fuzzing its inputs and thread interleavings. Due to the current lack of open-source GBF tools for concurrent programs [9], *EBF* uses our own concurrencyaware gray-box fuzzer *OpenGBF*. Its implementation extends *AFL++*, a state-of-theart GBF for single-threaded programs, by introducing the following concurrency-aware lightweight instrumentation in the PUT.

First, *OpenGBF* injects delays after each instruction at the *LLVM* intermediate representation level. The value of these delays (typically several micro-seconds) is controlled by the fuzzer and implicitly forces the execution of different thread interleavings. Second, *OpenGBF* inserts functions for recording all the information needed for witness generation: assumption values, thread ID, variable names, and function names. Third, *OpenGBF* supports the use of *UndefinedBehaviorSanitizer* [10], *AddressSanitizer* [11] and *ThreadSanitizer* [12] for the detection of vulnerabilities that cannot be expressed as reachability errors (e.g., buffer overflows, thread leaks).

Results Aggregation Stage. Finally, *EBF* aggregates the outcomes of the *Safety Proving* and the *Falsification* stages as depicted in the table in Fig. 1. The majority of cases are straightforward: if one of the tools produces an inconclusive verdict (i.e., *Unknown*), then *EBF* relies on the decision provided by the other tool. However, if *OpenGBF* finds a bug in the PUT that is deemed to be safe by BMC, *EBF* reports a *Conflict*. In this case extra information can be obtained from the counter-example produced by the fuzzer.

### 3 Strengths and Weaknesses

*EBF 4.2* participated in the *ConcurrencySafety* category of *SV-COMP 2023*, which comprises four subcategories: *ConcurrencySafety-Main*, *NoDataRace-Main*, *ConcurrencySafety-NoOverflows* and *ConcurrencySafety-MemSafety*.

Regarding the *ConcurrencySafety-Main* subcategory, *EBF 4.2* provided 357 correct results out of 692, with only 1 incorrect false and the rest unknown. More in detail, *EBF* correctly identified 67 safe benchmarks and 249 unsafe benchmarks, thus highlighting the *EBF* strengths in bug-finding. In addition, *EBF* labeled an extra 41 benchmarks as unsafe, which were not confirmed by the witness validator. Among these benchmarks, there are 10 verification tasks (beginning with *goblint-regression/28 race reach \**) where only two tools can find bugs: *EBF* and *Infer* [13]. At the same time, we hypothesise that the counter-examples provided by *EBF* are more trustworthy than those provided by *Infer* for these 10 tasks. This is because *EBF* is very conservative in its bug-finding claims, with 290 correct false outcomes, 41 unconfirmed, and only 1 incorrect. In contrast, *Infer* produces 330 correct false outcomes and 331 incorrect ones.

Regarding the *NoDataRace-Main* subcategory, *EBF 4.2* only offered partial support for data race detection by enabling *ThreadSanitizer* inside *OpenGBF*. Unfortunately, the BMC engine we used in this year's competition, *ESBMC*, does not yet maintain full support of this safety property. As a consequence, *EBF* provided only 199 correct verification verdicts out of 904, of which 112 were correct true and 87 correct false. At the same time, *EBF* also reported 46 incorrect verdicts (23 incorrect true and 23 incorrect false), which resulted in a negative score for this subcategory.

Regarding the *ConcurrencySafety-NoOverflows* and *ConcurrencySafety-MemSafety* subcategories, *EBF 4.2* did provide support for detecting arithmetic overflows and memory safety violations by enabling *UndefinedBehaviorSanitizer* and *AddressSanitizer*. However, we did not succeed in providing an implementation that was compliant with the competition standards in time.

As a result, *EBF* did not feature in these subcategories.

### 4 Tool Setup and Configuration

In order to use *EBF*<sup>3</sup> , the user must set the architecture (32 or 64-bit) with flag -a, the property file path with flag -p, the benchmark file paths, and run the following command from the *EBF* root directory:

```
./scripts/RunEBF.py [-h] [-a {32,64}] [-p PROPERTY_FILE]
                              [benchmark]
```
Furthermore, there are optional flags that can be enabled (e.g., set the time and memory limit for each engine). In SV-COMP 2023 we divided the allotted 15 minutes of CPU time per verification task across the verification stages inside *EBF 4.2* as follows: 400s for the safety proving stage, 120s for the seed generation stage, 240s for the falsification stage, and the remaining 140s were allocated for the results aggregation, counter-example generation and potential execution overheads.

### 5 Software Project

We released *EBF* 4.2 under the MIT License, and its code is publicly available on GitHub<sup>4</sup> . All dependencies and installation instructions are listed in the repository README.md.

<sup>3</sup> https://gitlab.com/sosy-lab/sv-comp/archives-2023/-/blob/main/ 2023/ebf.zip

<sup>4</sup> https://github.com/fatimahkj/EBF

### Data-Availability Statement

The tool and all necessary files are available on Zenodo [14][15].

### Acknowledgment

The authors would like to thank Dr. Mustafa A. Mustafa for his constant support. The work in this paper is partially funded by the Engineering and Physical Sciences Research Council (EPSRC) grants EP/T026995/1, EP/V000497/1, EU H2020 ELEGANT 957286, and Soteria project awarded by the UK Research and Innovation for the Digital Security by Design (DSbD) Programme.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Goblint: Autotuning Thread-Modular Abstract Interpretation (Competition Contribution)

Simmo Saan1(B)? , Michael Schwarz<sup>2</sup> , Julian Erhard<sup>2</sup> , Manuel Pietsch<sup>2</sup> , Helmut Seidl<sup>2</sup> , Sarah Tilscher<sup>2</sup> , and Vesal Vojdani<sup>1</sup>

<sup>1</sup> University of Tartu, Tartu, Estonia {simmo.saan,vesal.vojdani}@ut.ee <sup>2</sup> Technische Universität München, Garching, Germany {m.schwarz,julian.erhard,m.pietsch,helmut.seidl,sarah.tilscher}@tum.de

Abstract. The static analyzer Goblint is dedicated to the analysis of multi-threaded C programs by abstract interpretation. It provides multiple techniques for increasing analysis precision, e.g., configurable context-sensitivity and a wide range of numerical analyses. As a rule of thumb, more precise analyses decrease scalability, while not always necessary for solving the task at hand. Therefore, Goblint has been enhanced with autotuning which, based on syntactical criteria, adapts analysis configuration to the given program such that relevant precision is obtained with acceptable effort.

### 1 Verification Approach

Goblint is a static analysis framework for C programs based on abstract interpretation [6]. It features scalable thread-modular analysis of concurrent programs on top of flow- and context-sensitive interprocedural analysis. The analysis is specified as a side-effecting constraint system [2], which can conveniently express flow-insensitive invariants as well as flow-sensitive information per program point [16] and is solved using a local generic solver [15]. Here, we detail some recent SV-COMP–related advances in Goblint. The previous competition tool paper [11] provides further details on the general approach.

New abstract domains have been added to enhance precision. In addition to interval analysis of integer variables, Goblint now performs interval analysis of floating-point variables following Miné [9], and maintains congruence information [7]. Furthermore, the Apron library [8] has been integrated for relational analysis. Goblint includes novel approaches to relational analysis of concurrent programs [14], inferring relations between jointly-protected global variables.

In the previous tool paper, we suggested dynamically tailoring Goblint to the program under analysis. This can increase precision, by activating analyses that are more expensive yet offer crucial precision, and also decrease resource

 c The Author(s) 2023 S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 547–552, 2023. https://doi.org/10.1007/978-3-031-30820-8\_34

<sup>?</sup> Jury member

usage, by deactivating redundant analyses. To this end, we have implemented analysis configuration autotuning based on cheap syntactic heuristics on the program, before the analysis begins. The particular features have been chosen according to how expert users might configure Goblint for a given program. Measurements of program size (e.g. number of functions, loops, variables) are taken into account to limit slowdown on larger programs.

Goblint provides a multitude of concurrency-related analyses (e.g. races, symbolic locking patterns, thread joins [14, 16]) that have no use in singlethreaded programs which abound in SV-COMP. Hence, all such analyses are now automatically deactivated for programs that never create any threads.

Goblint implements a wide variety of numerical abstract domains, but most are not necessary for every program, thus, offering many possibilities for autotuning. Interval information is omitted in calling contexts of recursive functions to avoid an explosion of contexts in which they are to be analyzed. While the congruence domain is generally active on small programs, for medium-sized programs it is only enabled for functions involving the modulo operator, either directly or indirectly (up to fixed depth in the call stack). If the program uses enums, then an integer domain for sets of enumeration values is activated. Octagon analysis is enabled for those local variables which occur most often in linear expressions and conditions. Interval and octagon widening thresholds are extracted from conditional expressions containing constants. Such thresholds are especially useful for flow-insensitive analysis of global variables in multi-threaded programs, since no narrowing is performed on flow-insensitive invariants.

Loop unrolling is a well-known technique to increase the precision of static analysis. Goblint now unrolls loops up to their static bounds or feasible unrolled code size. Loops which contain memory allocation, thread creation, or error function calls, are prioritized since unique heap locations and threads are key to maintaining analysis precision.

Schwarz et al. [13] enhanced Goblint with a suite of concurrent value analyses and evaluated their precision. Following their observations, we use the cheap yet sufficiently precise Protection-Based Reading. Data-race detection was made more precise using may-happen-in-parallel analysis [14], to filter out spurious races with threads that have already been joined or have not yet been created.

### 2 Software Architecture

Goblint is implemented in OCaml and uses an updated fork of CIL [10] as its parser frontend for the C language. It depends on Apron [8] for relational analyses. No other major libraries or external tools are required.

Goblint employs a modular architecture [1] where a combination of analyses can be selected at runtime. Analyses are defined through their abstract domains and transfer functions, which can communicate with other analyses using predefined queries and events. The combined analyses together with the control-flow graphs of the functions yield a side-effecting constraint system [2], which is solved using a local generic solver [15]. The solution is post-processed to determine the verdict and construct a witness.

### 3 Strengths and Weaknesses

Goblint focuses on sound static analysis which is confirmed by the competition: our tool does not produce any incorrect results. A major limitation of our approach is that, due to over-approximation, the tool can only prove the absence of bugs, but not their presence. Thus, when Goblint flags a potential violation, it answers "unknown" in the competition.

In SV-COMP 2023, NoDataRace became an official category and existing ConcurrencySafety reachability tasks were newly included into it. This is where Goblint really shines: it proves 652 out of 783 programs race-free, thereby winning the category. Overall, the strengths and weaknesses of Goblint w.r.t. categories remain the same as described in our previous tool paper. Therefore, we describe here the impact of autotuning, based on our own preliminary comparative evaluation. Unlike official SV-COMP evaluation, we used a 1 GB memory limit, which is sufficient for most tasks Goblint can solve, and no witness validators.

As noted above, the majority of SV-COMP programs across all categories are single-threaded, thus, the greatest improvement comes from disabling all concurrency analyses in those cases. This yields a notable reduction in runtime and memory usage as shown in table 1, improving overall efficiency without compromising precision.

The second greatest improvement is due to the use of relational analysis with octagons. Although this incurs a runtime penalty, it increases the number of correct verdicts notably. The improvement is especially visible in NoOverflows, where it yields 104 additional correct results. We also confirmed that the automatic selection of octagon variables is better than tracking all variables: our selection yields more correct verdicts (due to fewer timeouts) while successfully avoiding an unnecessarily large performance penalty.

Autotuning along the other axes is not as impactful. Nevertheless, each leads to Goblint being able to solve tasks it could not otherwise. Hence, a small increase in score is achieved, justifying their use. Although disabling unnecessary

Table 1. Reduction in resource usage due to disabling all concurrency analyses for single-threaded programs, as reported by BenchExec using ordinary least squares (OLS) regression.


concurrency analyses reduces resource usage, overall this performance improvement is canceled out by the simultaneous use of expensive analyses enabled by autotuning, such as octagons. Thus, Goblint can solve more tasks while retaining the same level of overall efficiency observed in previous editions of the competition [3].

Many future opportunities for autotuning exist: Goblint implements a number of concurrent value analyses offering different tradeoffs between time and precision [13, 14], but only used the fastest and least precise of these in SV-COMP. If appropriate heuristics for using the more involved analyses are identified, autotuning could enable these when they are likely to yield a benefit. Autotuning could be extended to supply a sequence of configurations, increasing in precision, for a portfolio of analyses, instead of relying on the autotuning to immediately pick the most appropriate configuration. While the current autotuning in Goblint is hand-crafted, machine learning may provide additional improvements.

### 4 Tool Setup and Configuration

Goblint version svcomp23-0-g4f5dcf38f participated in SV-COMP 2023 [4, 12]. It is available in both binary (Ubuntu 22.04) and source code form at our GitHub repository under the svcomp23 tag.<sup>3</sup> The only runtime dependency is Apron [8]. Instructions for building from source can be found in the README.

Both the tool-info module and the benchmark definition for SV-COMP are named goblint. They correspond to running the tool as follows:

```
./goblint --conf conf/svcomp23.json \
          --set ana.specification property.prp input.c
```
Goblint participated in the following categories: ReachSafety, Concurrency-Safety, NoOverflows, SoftwareSystems and Overall, while opting-out from Mem-Safety, Termination and SoftwareSystems-\*-MemSafety.

### 5 Software Project and Contributors

Goblint development takes place on GitHub,<sup>4</sup> while related publications are listed on its website.<sup>5</sup> It is an MIT-licensed joint project of the Technische Universität München (Chair of Formal Languages, Compiler Construction, Software Construction) and University of Tartu (Laboratory for Software Science).

Acknowledgements. This work was supported by Deutsche Forschungsgemeinschaft (DFG) – 378803395/2428 ConVeY and the Estonian Centre of Excellence in IT (EXCITE), funded by the European Regional Development Fund. We would like to thank everyone who has contributed to Goblint over the years, especially the students who contributed various autotunable analyses.

<sup>3</sup> https://github.com/goblint/analyzer/releases/tag/svcomp23

<sup>4</sup> https://github.com/goblint/analyzer

<sup>5</sup> https://goblint.in.tum.de

Data Availability. All data of SV-COMP 2023 are archived as described in the competition report [4] and available on the competition web site. This includes the verification tasks, results, witnesses, scripts, and instructions for reproduction. The version of Goblint as used in the competition is archived together with other participating tools [5] and individually [12] on Zenodo.

### Bibliography


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

??Soha Hussein? ? ?1() , Qiuchen Yan1() , Stephen McCamant1() , Vaibhav Sharma1() , and Michael W. Whalen1()

> University of Minnesota, Minneapolis, MN, USA {soha,yanxx297,smccaman,vaibhav,mwwhalen}@umn.edu

Abstract. Java Ranger is a path-merging tool for Java Programs. It identifies branching regions of code and summarizes them by generating a disjunctive logical constraint that describes the behavior of the code region. Previously, Java Ranger showed that a reduction of 70% of execution paths is possible when used to merge branching regions of code that support numeric constraints.

In this paper, we describe the support of two additional features since participation in SV-COMP 2020: symbolic array and symbolic string operations. Finally, we present a preliminary evaluation of the effect of the structure of the disjunctive constraint on the solver's performance. Results suggest that certain constraint structures can speed up the performance of Java Ranger.

### 1 Introduction

Path-merging [1,7,8] is a technique that speeds up the execution of Dynamic Symbolic Execution (DSE) by collapsing paths within code regions into a disjunctive logical constraint. Java Ranger (JR) [12] is a path-merging tool for Java Programs. It summarizes symbolic branches during execution. JR generates the disjunctive logical constraint for a code region predicated on a symbolic branch by using a sequence of transformations. For example, JR alternates between substituting values for local variables in its summary and inlining method summaries to eliminate dynamically dispatched method invocations. See [11] for more information.

### 2 Path Merging Extensions and Results

Despite handling many of the Java language features, in SV-COMP 2020 [10] JR did not support symbolically executing string functions. It also did not sum-

c The Author(s) 2023

<sup>?</sup> The research described in this paper has been supported in part by the National Science Foundation under grant 1563920, and Google Summer of Code.

<sup>??</sup> Jury member

<sup>? ? ?</sup> Lecturer on a Leave of Absence Ain Shams University, Cairo, Egypt soha.hussien@cis.asu.edu.eg

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 553–558, 2023. https://doi.org/10.1007/978-3-031-30820-8 35

marize arrayload and arraystore statements that exist outside a code region predicated on a symbolic branch. For example, if a and i are symbolic integers, JR could summarize a region of the form: if(a) {myval = arr[i]...} But not: myval = arr[i]. More precisely, the newly introduced features to JR include:


$$z 
my{val} := 
\text{ite}(i == 0, 
arr[0], 
\text{ite}(i == 1, 
arr[1], 
arr[2]))$$

Similarly, we encode the arraystore of the form arr[i] = myval as

$$\begin{array}{l} arr[0]\_{new} := \text{ite}(i == 0, myval, arr[0]\_{old})\\ \wedge arr[1]\_{new} := \text{ite}(i == 1, myval, arr[1]\_{old})\\ \wedge arr[2]\_{new} := \text{ite}(i == 2, myval, arr[2]\_{old}) \end{array}$$

where arr[i]old, and arr[i]new indicate the old and the new values of the array arr at index i.

3. Symbolically Executing Symbolic Strings: We added support to some basic string operations for the String package and the StringBuilder package; this includes but is not limited to charAt, concat, contains, endsWith, equals, indexOf, length, replace, startsWith, isEmpty and substring.

#### 2.1 Run Configuration

In addition to JR configurations used in SV-COMP 2020 [10], we used the below configurations for turning on the added features.:


#### 2.2 Results

To understand the value of the JR's extensions above, we evaluated the old JR tool [9] from SV-COMP 2020, which had no support for symbolic arrays nor symbolic strings, to JR's version participating in 2023. We ran both versions on the verification tasks used in SV-COMP 2023. Results in Tb. 1 show an increased number of correctly solved tasks from 429 to 475, but more importantly, a significant reduction in incorrect results from 97 to zero. These improved


Table 1: results of JR's version participating in 2020 versus the improved 2023 version

scores show the importance and significance of the added support.

Unfortunately, however, because the current version of JR has no support for witness generation, all correctly reached false verdicts were not included in the SV-COMP 2023 score [2], which resulted in JR scoring 400 points instead of 675. In the future, we plan to extend JR to support witness generation.

#### 3 Formula Structure in Path-Merged String Constraints

Fig. 1 shows loopCharAt: an SV-COMP 2023 verification task [3] (from an example of Avgerinos et al. [1]) that can dramatically benefit from path-merging. The task accepts a symbolic string arg, and checks each character to see if it is the letter 'B'. If so it increments counter. The assertion fails if the value of the counter can be 121. For a symbolic string of length n,

```
 public static void loopCharAt(String arg) {
 int counter = 0;
 for (int i = 0; i < arg.length(); i++) {
 char myChar = arg.charAt(i);
 if (myChar == 'B') counter++;
 }
 assert (counter != 121);
 }
          Fig. 1: loopCharAt Example
```
this code has 2<sup>n</sup> execution paths, since each character can be B or not B independently. But applying path merging to the if statement leads to a single execution path for a given length string. While JR sees this expected asymptotic benefit (one path per string length), reaching the assertion failure takes more than 2 hours, well beyond the competition time limit. Most time is spent in the solver, so we investigated whether changing the syntax of the query could improve performance.

Fig. 2: Average running time by size and query type

Each query generated from the satisfiability of the assert statement asks whether an n-character string can contain 121 (or more generally, k) B characters; this query is satisfiable if 0 ≤ k ≤ n. We used a script to generate variations of the query for different values of n and k, and different semantically equivalent ways of expressing the constraints. We then measured the time to solve the queries using Z3 4.8.15 with the seq string solver, on an Intel i7-3770 workstation running Ubuntu 20.04. The choice of k appeared to have little effect on performance, so we report the results of averaging over runs with 0 ≤ k ≤ n+ 1. Figure 2 shows how the running time grows with n, and that the query style has a large impact on performance.

We describe the query styles in order of increasing overhead. Because no complex string operations are needed, an equivalent query can be expressed in a simple bit-vector (QF BV) logic. This was by far the fastest, and the only style where the running time appears to grow linearly with n. The remaining styles use a logic of strings and integers (QF SLIA), and we started with the constraint style that seemed most natural to write by hand ("clean") and sequentially added complexities to make the constraints increasingly similar to those JR produces. All these QF SLIA styles appear to slow down as a cubic polynomial in n, as illustrated by the best-fit lines. Two features of JR's queries had little effect on performance: expressing the string length with a series of inequalities (in JR these come from the loop), and introducing a temporary variable corresponding to each update of the counter. A modest but measurable slowdown came from expressing the effect of the merged region with OR and AND operations, instead of the functional if-then-else operator. A final dramatic slowdown came from constraining the value of each character via its character code (= (str.to code (str.at s 0)) 66) (natural because Java's char is an integer type) instead of as a one-character string (= (str.at s 0) "B"). These results suggest that this verification task could become feasible in 15 minutes if either JR or solvers can transform the slow-to-solve forms into fast-to-solve ones.

#### 4 Data-Availability Statement

Java Ranger is developed at the University of Minnesota. It is continuously maintained on GitHub [6]. Readers interested in the reproducibility of Java Ranger results in the competition an artifact can be found here [5,4].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Korn—Software Verification with Horn Clauses (Competition Contribution)

Gidon Ernst()?

LMU Munich, Munich, Germany gidon.ernst@lmu.de

Abstract. Korn is a software verifier that infers correctness certificates and violation witnesses sutomatically using state-of-the-art Horn-clause solvers, such as Z3 and Eldarica. The solvers are used in a portfolio together with cheap random sampling where the latter can be very effective at finding counterexamples. Korn perfomend best in the Recursive sub-category of SV-COMP 2023.

Keywords: Software Verification · Horn Clauses · Loop Contracts

### 1 Verification Approach

Korn is a verifier for C programs that is based on a translation into systems of constrained Horn clauses [5,12]. Therein, each program location is abstracted by a second-order predicate over the program variables which are active at that point. The system of Horn clauses has a (second-order) solution if and only if the program is correct. Horn clauses encodings are a convenient intermediate representation that is linear in the size of the program and that is inherently modular, such that loops, procedure contracts, and non-local control flow like gotos and labels can be easily abstracted (see Sect. 3 wrt. category Recursive).

Korn uses state-of-the-art solvers to determine the satisfiability of the generated Horn clause system (cf. Sect. 2), specifically for SV-COMP it uses Z3 [6] and Eldarica [15]. Both solvers generate evidence for correctness of a given program in terms of models that describe how the unknown predicates need to be instantiated. Moreover, Eldarica can generate counterexample traces, and Korn instruments the Horn clause system to get the concrete values returned by the \_\_VERIFIER\_nondet\_\*() functions on an error path. For these reasons, Korn tends to produce detailed correctness and violation witnesses.

The different solvers have different strengths and weaknesses. To that end, Korn implements a portfolio approach with several sequential stages. The configuration for SV-COMP 2023 [2] is as follows, where the specific timeouts for the individual tools are chosen heuristically based on prior experiments:

1. Initially, 10s of random sampling with small values is performed. It picks for each input value uniformly between number 0, and values of 2, 5, and 10 bits respectively, possibly with a sign. Absense of too large values avoids

c The Author(s) 2023

<sup>?</sup> Jury Member

https://doi.org/10.1007/978-3-031-30820-8\_36 S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 559–564, 2023.

very long running loops when the counter is nondeterministic. There is no particular justification for the sampling scheme, but it is effective.


Korn is overall similar to SeaHorn [13] but it operates on the C source level instead of LLVM. Korn aims at a rather different design point, namely to favor simplicity over features, therefore offering a good platform for experiments. Eldarica has its own C frontend that supports a different set of features, recently published as TriCera [11]. Here the main distinction is that Korn uses a large block encoding, such that the verification conditions closely reflect the structure of the program. Korn offers a second verification approach with loop contracts [16,14,7]. This was the original motivation to develop the tool, and neither SeaHorn nor TriCera supports this feature, albeit it was not used for SV-COMP because it offers no advantages [10] and because the encoding of loop contracts into loop invariants would require quantifiers in the witnesses format.

### 2 Software Architecture

Korn is mainly written in the JVM language Scala.<sup>1</sup> The front-end uses a custom parser, generated with jFlex and Beaver. The random sampler relies on native execution which links the benchmark task with a C file \_\_VERIFIER\_random.c that implements the \_VERIFIER\_nondet\_\* functions. Verification conditions are generated in the fragment of SMT-LIB of the HORN logic.<sup>2</sup> Korn can invoke any compliant solver as a backend either using its standard input or a file to communicate the verification task. There is explicit support for Z3 [12], Eldarica [15] to pass e.g. timeouts with tool-specific options or to produce models resp. counterexamples. Currently, Korn use the theories of integers and arrays.

In order to produce SV-COMP correctness witnesses, Korn can read the models generated by the backend-solvers, and translate them back into C expressions. The correctness witnesses produced currently are derived from the invariants that are reported back by the Horn solvers (get-model resp. -ssol flag of Eldarica). Violation witnesses are either read off the output of Eldarica (-cex flag), or from the output of the random sampler, as a sequence of nondeterministic choices. When a counterexample is found, a test harness is compiled to confirm whether reach\_error() is in fact called.

<sup>1</sup> https://scala-lang.org

<sup>2</sup> https://chc-comp.github.io/format.html

### 3 Discussion: Strengths and Weaknesses

Korn supports a substantial fraction of the C language, with the greatest limitation being the lack of support for dynamic data structures (see website for a detailed account), which means that currently any task which requires a memory model is out of scope. The translation supports most control structures, including goto and labels. With respect to solving verification tasks, Korn inherits the strenths and limitations of the underlying solvers. Tasks that for which invariants and procedure contracts are expressible in linear integer arithmetic are typically proved quickly by the solvers, whereas they struggle on tasks with arrays and quantified invariants. Honoring these aspects, Korn participated in four categories, ControlFlow, Loops, Recursive, XCSP for property ReachSafety.

The theoretical approach used by Korn is sound and complete relative to the solver capabilities. Korn produced no incorrect result in SV-COMP 2023, but there are circumstances which could lead to wrong verdicts. With respect to C semantics, Korn currently makes the following trade-offs:


The random sampler is very effective—in SV-COMP 2023 it discovered all 210 violations reported by Korn, of which 204 are found within 2 seconds. Sampling of small non-zero values is crucial, e.g., Ackermann02.c falsifies with input vector [2,0]; using all zero inputs still finds 57 of these 210 violations.

A key strength of Horn clause encodings is that they are inherently modular. This means that loops and recursion are abstracted by invariants resp. pre-/postcondition pairs. The latter enable Korn to significantly outperform all other tools in category Recursive. Plausible explanations are that classic state-space exploration techniques struggle to abstract call stacks or maybe that

Table 1. Comparison of official results (number of tasks solved) in comparison to result of the best-scoring other tool in that category and post-competition experiments after fixing an issue with the submitted Korn verifier archive which did not run Eldarica at all. # Tasks is the number of tasks supported by Korn vs. category size. The result marked by † is without counterexample confirmation. The official results can be found at https://sv-comp.sosy-lab.org/2023/results/results-verified/


techniques developed for loops like k-induction have simply not been adapted well to recursive procedures. For Horn clause encodings on the other hand both abstractions are uniform and solvers are largely agnostic to the purpose of predicates. As a downside of enforcing modular proofs, Korn is currently unable to compete in category Arrays, where finding the quantified invariants is hard but state-space exploration succeeds on tasks with fixed loop bounds.

Unfortunately, in the 2023 competition, Eldarica did not run at all due to some unknown problem with the verifier archive, such that Korn terminated way too early and missed out on many results. Table 1 presents results from re-running the evaluation on the competition hardware. This produces 208 additional proofs from Eldarica in category Loops with a hypothetical score of 755 wrt. 323 in SV-COMP 2023, albeit the actual score would be lower than that because usually not all witnesses are confirmed.

### 4 Software Project, Configuration & Participation

The implementation of Korn is available at https://github.com/gernst/korn under the MIT license, installation instructions are part of the README. The SV-COMP 2023 submission was packaged from commit 8e968dd and shows version 0.4. The included solvers are Z3 4.11.2 64 bit (default configuration) and Eldarica v2.0.8 (using -portfolio). The command line in SV-COMP 2023 is

./run -write -model -witness witness.graphml -confirm \ -random 10 -timeout 20 -z3 -timeout 900 -eld:portfolio <file.c>

Participation: ControlFlow, Loops, Recursive, XCSP for ReachSafety.

Contributors. Korn is developed and maintained by the author. G. Alexandru [1] and J. Blau have contributed insights to approach of loop contracts [7].

### Data Availability Statement

The tool archive packaged for SV-COMP 2023 is part of the official tools artifact [4] and also available separately [9]. The official competition results [3] are complemented with our post-competition evaluation, based on commit 92e6732 and are available at [8].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Mopsa-C: Modular Domains and Relational Abstract Interpretation for C Programs (Competition Contribution)**

Raphaël Monat1*⋆*(), Abdelraouf Ouadjaout2, and Antoine Miné3

<sup>1</sup> Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France <sup>2</sup> Grenoble, France

<sup>3</sup> LIP6, Sorbonne Université, F-75005, Paris, France

**Abstract.** Mopsa is a multilanguage static analysis platform relying on abstract interpretation.

It is able to analyze C, Python, and programs mixing these two languages; we focus on the C analysis here. It provides a novel way to combine abstract domains, in order to offer extensibility and cooperation between them, which is especially beneficial when relational numerical domains are used. The analyses are currently flow-sensitive and fully context-sensitive. We focus only on proving programs to be correct, as our analyses are designed to be sound and terminating but not complete. We present our first participation to SV-Comp, where Mopsa earned a bronze medal in the *SoftwareSystems* category.

**Keywords:** Static analysis · Abstract interpretation · Competition on Software Verification · SV-Comp

### **1 Verification Approach: the Mopsa platform**

Mopsa is an open-source static analysis platform relying on abstract interpretation [4]. The implementation of Mopsa aims at exploring new perspectives for the design of static analyzers. Mopsa has a triple objective:

**–** To allow developers to define abstract domains in a modular fashion – that is, as independently of each other as possible. In particular, this means that each abstract domain can easily be enabled or disabled to customize an analysis.

**–** To allow different abstract domains to cooperate and communicate in a relational way. Previous analyzers were able to combine domains in tree-shaped structures [5, Fig. 1]. Mopsa allows sharing between abstract domains, meaning schematically that the domains can be combined into an acyclic graph.

**–** To support the analysis of multiple languages while reusing existing abstractions. Mopsa is able to analyze C [16], Python [13], and multilanguage Python/C programs [14]. The Michelson smart contract language is being added [1]. Other safe analyzers, such as Astrée [5], Frama-C [6], Goblint [19], and TAJS [8] are specialized in analyzing a single language.

*<sup>⋆</sup>* Jury member

A. Ouadjaout—Unaffiliated.

<sup>©</sup> The Author(s) 2023

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 565–570, 2023. https://doi.org/10.1007/978-3-031-30820-8\_37

These aims are achieved through a dynamic expression rewriting mechanism, and a unified signature for abstract domains and iterators. Journault et al. [9] describe the core Mopsa principles, and Monat [12, Chapter 3] provides an indepth introduction to Mopsa's design.

The C analysis which we rely on for this competition is based on the work of Ouadjaout and Miné [16]. The analysis works by induction on the syntax, is fully context- and flow-sensitive, and committed to be sound. It targets complete programs that have not been modified: Mopsa can be seamlessly integrated in standard build systems (such as make), it supports main functions with symbolic arguments, and it includes precise stubs for most of the standard C library. Our benchmarks analyses include, for instance, several tools from coreutils.

Mopsa is written in 50,000 lines of OCaml code [21], and relies on the Clang frontend to parse C programs. It relies on the Apron library [7] to handle relational numerical abstract domains.

### **2 Software Architecture: the SV-Comp driver**

By default, the C analysis of Mopsa detects all the runtime errors that may happen in the analyzed program (NULL pointer dereferences, integer overflows, ...), while SV-Comp tasks focus on a specific property at a time (reachability of a function, validity of memory accesses, ...). We thus created an SV-Comp specific driver. It takes as input the task description (program, property, data model). It runs increasingly precise C analyses defined in Mopsa until the property of interest is proved or the most precise analysis is reached. Each analysis result is postprocessed by the driver to check if the property is proved.

An analysis configuration defines the set of domains used, as well as their parameters allowing modifications of the precision-efficiency ratio. The four increasingly precise configurations we use are the following:

– Conf. 1 is the base analysis relying on intervals and cells (a field-sensitive, lowlevel memory abstraction able to handle type-puning, pointer casts, C unions, . . . ) [11]. Global structures having up to 5 fields are precisely initialized.

– Conf. 2 additionally enables the string length domain [10], which precisely tracks the position of the first 0 in byte arrays. Static struct initialization is done precisely for structures having up to 50 fields.

– Conf. 3 adds a polyhedra abstract domain. This includes tracking numerical relations between string lengths and scalar variables.<sup>4</sup> It relies on a static packing heuristic [5] to achieve a good precision-scalability tradeoff.

– Conf. 4 adds a congruence abstract domain, delayed widenings, and widening with thresholds.

A schematic representation of the domains used in these analyses is shown in Figure 1. The SV-Comp driver is written in 250 lines of Python code.

4 In this case, Mopsa's ability to share abstract domain comes in handy. With a tree, we would have to "linearize" the domains and put either cells or string length on top of the other. This makes reduction more difficult (e.g., Astrée uses a global reduction system on the whole tree, while we can use local reductions between two domains).

**Fig. 1.** Configurations for Mopsa-C analyses used in SV-Comp. Dotted rectangles indicate optionally enabled domains. "U.\*" domains are shared between the analysis of different languages, while the others are C-specific. The sequence operator lets the domain on the left handle the analysis of a given statement: if it cannot, the analysis continues with the domain on the right. The composition operator allows multiple domains to share the same underlying domain. Products let both domains analyze the given statement. In the case of a reduced product, a reduction operation is applied after the analysis of a statement.


**Fig. 2.** Results of the increasingly precise analyses (21220 tasks in total, 12636 correctness tasks). Conf. 2 is able to prove 738 tasks correct in addition to the 5695 proved by conf. 1, although 86 tasks reach the resource limits when analyzed by conf. 1 and 2. Mopsa yields unknown in the analysis of the other tasks.

#### **3 Strengths and Weaknesses**

Mopsa participated in all categories targeting reachability, memory safety and overflow properties: *ReachSafety, MemSafety, NoOverflows* and *SoftwareSys-* *tems*. It did not compete in the datarace and termination categories. The competition report [2] details all results.

Mopsa relies on over-approximations to guarantee soundness and termination of its analyses. As such, Mopsa scales well on SV-Comp benchmarks: the successive analyses described in Section 1 yield a result within the allocated resources in 91% of the tasks (and 98.5% of the cases for our cheapest analysis). We show the detailed precision benefits of each analysis for the benchmarks in Figure 2. Thanks to Mopsa's scalability and commitment to soundness, we have been able to discover and fix defects within SV-Comp benchmarks which were not discovered by previous tools. In particular, we fixed 164 task definitions, as well as 23 programs with unintended issues in their source code.<sup>5</sup> Mopsa is especially competitive in the *SoftwareSystems* category, focusing on verifying real software systems: it ranked third for our first participation.

Our approach is scalable but not complete: we can only prove programs correct. In other cases, we cannot decide if the issues we found are real bugs or false alarms: we return "unknown" in all these cases to avoid yielding incorrect results. Thus, we can only obtain points on correctness verification tasks, which represents around 58% of the current tasks. Our future work includes finding approaches to exhibit real counterexamples when they exist.

In addition, our analyses are not precise enough for some small but intricate benchmarks (for exemple, on arrays). In particular, the current version of Mopsa does not support partitioning the abstract state into different ones to improve its precision. We plan to add this classic feature for SV-Comp's next edition. For an over-approximating analyzer, Mopsa is nevertheless quite precise: Mopsa is able to prove around 8% more tasks than Goblint [19, 20] (the leading state-of-the-art abstract interpreter running in SV-Comp).

Finally, the SV-Comp driver we built does not extract precise witnesses from the analyses. Indeed, the case of invariant generation for loops defined in functions called in different contexts seems open for now: Saan [18] observed that complex, interprocedural witnesses do not help the witness verifiers. However, the trivial correctness witnesses we generate are validated in 96.4% of the cases.

### **4 Software Project and Contributors**

Mopsa is currently available on Gitlab[17], and released under an open-source license (GNU LGPL v3). Mopsa was originally developed at LIP6, Sorbonne Université following an ERC Consolidator Grant award to Antoine Miné. Mopsa is now developed in other places, including Inria, Airbus, and Nomadic Labs. We thank Matthieu Journault for being one of the initial contributors to Mopsa. This first participation to SV-Comp has spurred a lot of interesting discussions within our development team, and lead to 20 bugfixes and new features.

<sup>5</sup> We also added contributed to the benchmarks used in SV-Comp, by adding tasks to check overflows from the Juliet Benchmarks (6156 new tasks); and reviewing 12 merge requests from the community.

**Data-Availability Statement** The exact version of Mopsa that participated in SV-Comp 2023, and our specific driver are available as a Zenodo archive [15]. A global tool archive is also available [3].

**Acknowledgements.** We thank Simmo Saan for his precious advice on how to start integrating our tool within SV-Comp.

### **References**


570 R. Monat et al.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## PIChecker: A POR and Interpolation based Verifier for Concurrent Programs (Competition Contribution)?

Jie Su , Zuchao Yang , Hengrui Xing , Jiyu Yang , Cong Tian ()??, and Zhenhua Duan

ICTT and ISN Lab, Xidian University, Xi'an 710071, China {jsu 3,mujueke,morui,jiyuy2024}@stu.xidian.edu.cn, {ctian,zhhduan}@mail.xidian.edu.cn

Abstract. PIChecker is a tool for verifying reachability properties of concurrent C programs. It moderates the trace-space explosion problem, aggravated by thread alternation, through utilizing the PC-DPOR and C-Intp techniques. The PC-DPOR technique constructs a constrained dependency graph to refine dependencies between transitions. With this basis, the inherent imprecision of the dependence over-approximation can be overcome. Thereby, many redundant equivalent traces are prevented from being explored. On the other hand, the C-Intp technique performs conditional interpolation to confine the reachable regions of states, so that infeasible conditional branches which occur more frequently in concurrent verification tasks could be pruned automatically. We have implemented the above techniques on top of the open-source program analysis framework CPAchecker.

Keywords: Partial-Order Reduction · Interpolation · Concurrent Program · Model Checking

### 1 Verification Approach

Program synthesis[11] and verification[5] are two ways to improve the quality of software. In this paper, we propose a tool, namely PIChecker, that utilizes the PC-DPOR [9] and C-Intp [8] techniques to verify the reachability properties of concurrent programs. These techniques work in two different ways, equivalent trace class partitioning and infeasible conditional branch pruning, to reduce the search space in model checking.

The PC-DPOR technique addresses the problem that the coarse dependency approximation of transitions used in many POR [6] approaches significantly increases the number of equivalent trace classes to be explored. In order to reduce

© The Author(s) 2023 S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. https://doi.org/10.1007/978-3-031-30820-8 38 . 571–576, 2023.

<sup>?</sup> This research is supported by the National Natural Science Foundation of China (No. 62192734, No. 61732013 and No. 62172322).

<sup>??</sup> The corresponding author.

unnecessary exploration, the PC-DPOR technique constructs a constrained dependency graph (CDG) to refine the dependencies between transitions, where the edges in a CDG represent the dependency constraints that transitions from different threads depend on each other. The first configuration in Fig. 1 combines this technique with BDD-based reachability analysis to explore the reachable state-space of a concurrent program. At each state s, if there are isolated transitions which have no connection with the nodes of other threads in the CDG, then only one reachable successor state s 0 corresponding to an isolated transition will be explored (i.e., the enabled transitions of other threads will be pruned). We have proved that the prioritized exploration strategy for isolated transition still provides full coverage of all program behaviors[9]. This prioritized exploration continues until a checking state without any successor of isolated transition is reached. Thereafter, the dependency between any two different transitions t and t <sup>0</sup> at a checking state can be dynamically determined by checking whether their dependency constraint holds at the checking state. If the constraint does not hold (i.e., t is independent of t <sup>0</sup> at the current checking state), then only one of the execution orders t · t <sup>0</sup> and t 0 · t will be explored. With the basis of CDG, the inherent imprecision of traditional dependence over-approximation is overcome and many redundant equivalent traces can be saved from being explored.

On the other hand, the C-Intp technique focuses on pruning the infeasible conditional branches that may be explored in traditional abstraction-refinement iterations [7] when predicates are insufficient. At each state s, besides the reachability check of error locations, the C-Intp technique also inspects whether there exists any path that contains infeasible conditional branches. If so, the C-Intp technique will treat such a path as another form of spurious path, and additional constraints, namely conditional interpolants, will be generated by performing conditional interpolation on these additional spurious paths. Thereafter, infeasible conditional branches can be pruned by introducing these constraints into the reachable regions of states. In order to improve the efficiency of satisfiability checking and Craig interpolation [4] steps performed by C-Intp, the generated conditional interpolants are utilized to shorten the interpolation paths. To do so, the shortest C-Intp formula chains which contain only the formulas that affect decision-making are constructed at each choice point to perform the interpolations. With the conditional interpolants and shorter interpolation paths, a sufficient amount of predicates can be generated efficiently, and more attention can be paid to the analysis of feasible paths.

### 2 Software Architecture

PIChecker is developed on top of CPAchecker with the PC-DPOR and C-Intp extensions. By taking the strength of the CPA concept, PIChecker uses different configurations as shown in Fig. 1 to cover as many concurrent programs as possible. Within the verification time-bound, the verification for a given program starts by executing the first configuration that combines the PC-DPOR technique and BDD-based reachability analysis. If a counterexample is reported,

Fig. 1. The verification flow that combines the PC-DPOR and C-Intp strategies.

the feasibility of this error path will be checked since the BDD-based reachability analysis in CPAchecker currently only supports the representation of integer variable values and other states in waitlist will continue to be explored if the counterexample is spurious. If the execution of the first configuration terminates unexpectedly within 900s, the verification will continue by using the other two CEGAR + C-Intp based configurations with different back-end solvers. In that case, the second configuration with the MathSAT5 will be chosen firstly. If its execution also aborts abnormally because the MathSAT5 solver fails to perform interpolation on the shortest C-Intp formula chains generated by the C-Intp approach, the last configuration with the SMTInterpol solver will finally be utilized if the time cost is still within the bound.

#### 3 Strengths and Weakness

Compared to CPAchecker which conservatively approximates the independence of transitions by checking whether a transition only accesses local variables [2], the use of CDG in PIChecker can improve the precision of estimating the dependencies of enabled transitions at reachable states. Therefore, the exploration of more traces in the same equivalent class can be avoided by utilizing PIChecker. In addition, different from most of the abstraction-refinement approaches that generate only a few number of predicates at the end of each iteration, the two CE-GAR + C-Intp based configurations can effectively generate a sufficient amount of conditional interpolants within a single round of iteration by performing the conditional interpolation technique at conditional branches. Thus, the exploration of many infeasible conditional branches can be avoided. For the sake of clarifying the improvement from PIChecker more clearly, a comparison between PIChecker and CPAchecker, on checking the unreach-call property under the category ConcurrencySafety in SV-COMP 2023, is made. The results indicate that PIChecker succeeds to verify 394 out of 665 verification tasks, which is more than 375 of CPAchecker. Further, for the 372 tasks that can be verified by the both tools, the average time and memory costs of PIChecker (37.49s, 672.15MB) only account for 56.58% and 61.71% of the corresponding overheads consumed by CPAchecker (66.27s, 1089.19MB), respectively.

In order to guarantee the correctness of verification results, some conservative strategies are adopted by the three configurations. For example, when the program statement corresponding to a transition contains non-deterministic function calls (e.g., 'x = VERIFIER nondet int();'), the PC-DPOR technique conservatively considers it to be dependent on other transitions if they access the same shared variables. These strategies may significantly reduce the verification efficiency.

### 4 Tool Setup and Configuration

PIChecker is built on the CPAchecker codebase and is publicly available<sup>1</sup> . It contains all the dependent libraries and requires a Java 11 Runtime Environment. In SV-COMP 2023, PIChecker only participates in the ConcurrencySafety category and checks the unreach-call property<sup>2</sup> . Before verifying a program, all files from the submitted archive must be extracted into the same folder. Executing PIChecker on a task can be done in the same way as executing any other CPAchecker configuration by running: scripts/cpa.sh -svcomp23-pichecker -timelimit <TIME LIMIT> [-spec <SPEC FILE>] <SOURCE FILE>. The experimental statistics and verification results are written in output/Statistics.txt. Moreover, human readable counterexamples output/Counterexample.%d.txt will be generated if the reachability property does not hold. For more instructions, please refer to README.md and INSTALL.md.

### 5 Software Project and Contributors

Based on the open-source tool CPAchecker [3], PIChecker has been developed by Jie Su, Zuchao Yang, Hengrui Xing, Jiyu Yang from the ICTT Lab in Xidian University under the supervision of Cong Tian and Zhenhua Duan. We thank Dirk Beyer and his team for their original contributions to CPAchecker. PIChecker is licensed under the Apache 2.0, and it also contains the copyright of CPAchecker.

Data Availability Statement. All data of SV-COMP 2023 are archived as described in the competition report[1] and available on the competition web site. This includes the verification tasks, results, witnesses, scripts, and instructions for reproduction. The version of PIChecker used in the competition is archived on Zenodo [10] and also in its own artifact at GitLab.

<sup>1</sup> PIChecker repository: https://gitlab.com/Lapulatos/pichecker.git

<sup>2</sup> The benchmark definition of PIChecker: https://gitlab.com/sosy-lab/sv-comp/ bench-defs/-/blob/main/benchmark-defs/pichecker.xml

### References


576 J. Su et al.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Ultimate Automizer and the CommuHash Normal Form (Competition Contribution)

, Max Barth , Daniel Dietsch , Leonard Fichtner, Jochen Hoenicke , Dominik Klumpp , Mehdi Naouar , Tanja Schindler , Frank Sch¨ussele , and Andreas Podelski Matthias Heizmann(B)

> University of Freiburg, Freiburg im Breisgau, Germany heizmann@informatik.uni-freiburg.de

Abstract. The verifcation approach of Ultimate Automizer utilizes SMT formulas. This paper presents techniques to keep the size of the formulas small. We focus especially on a normal form, called CommuHash normal form that was easy to implement and had a signifcant impact on the runtime of our tool.

### 1 Verifcation Approach

Ultimate Automizer (in the following called Automizer) is a software verifer that combines a CEGAR scheme and trace abstraction [6] to check safety and liveness properties.

Automizer's algorithm begins by transforming an input program to a program automaton whose transitions are labelled with formulas representing the efects of a statement (or multiple statements), whose accepting states correspond to error locations of the input program, and whose structure is equal to the structure of the control-fow graph of the input program. This program automaton recognizes a language, where every word is a sequence of statements that leads to an error location. If the language is empty, we can conclude that the program is safe. If the language is not empty, our algorithm picks a word from the language and checks whether it is feasible (i.e., the sequence of statements corresponds to an execution of the program) or infeasible. If the word is feasible we have found an actual counterexample. If it is infeasible we compute a proof of infeasibility for this sequence of statements. Afterwards we generalize this sequence of statements to a new automaton that accepts sequences of statements whose infeasibility can be shown by the very same proof. We then subtract the automaton with the language of infeasible words from the program automaton and obtain a new automaton that represents a smaller language, with which we continue the refnement loop. An important beneft of this approach is that because we perform the refnement step purely with automata operations, we never have to mix infeasibility proofs from diferent iterations.

This basic approach has not changed since the last competition. In the next section we explain improvements for the handling of SMT formulas.

### 2 SMT formulas in Ultimate

The Ultimate program analysis framework on which Ultimate Automizer is built upon, uses SMT formulas to represent the efect of program statements and to represent sets of states. We call formulas that represent sets of states state assertions. State assertions play a major role in the verifcation approach of Automizer. The infeasibility proof that we infer for each infeasible sequence of states is a sequence of state assertions and in the generalization step of the overall verifcation algorithm we have to check thousands of Hoare triples of the form {φ}st{ψ}, where φ and ψ are state assertions from infeasibility proofs. In order to check these Hoare triples, we reduce the validity problem for Hoare triples to a satisfability problem for SMT formulas and let an SMT solver decide the satisfability. The costs for the overall verifcation algorithm would be dominated by the costs for these satisfability checks if we would not take additional actions to keep the size of the SMT formulas low.

We infer the sequence of state assertions by Craig interpolation or by a symbolic execution (via strongest post and weakest precondition) that is supported by unsatisfable cores [3]. In the latter case the state assertions are usually quantifed and we try to get rid of these quantifers by applying several quantifer elimination techniques. These quantifer elimination techniques make the formulas simpler for SMT solvers but increase their size.

Our most powerful technique for reducing the size of formulas is an algorithm [4] that removes subformulas if the removal does not change the models of the formula. This algorithm however is itself costly because it calls an SMT solver for each subformula.

In order to also reduce the size of formulas without additional SMT solver calls, we utilize the following optimizations whenever we construct a formula.


### 3 The CommuHash Normal Form

An efect of the quantifer elimination techniques and the optimizations mentioned above is that we construct formulas in many places of our code. A sideefect of this is that we get formulas that have subformulas that difer only in the order of the parameters of a commutative operator. E.g., we saw formulas like, e.g., i = k ∨k ̸= i or a[i+k] = a[k +i]. For both formulas the logical equivalence to true would have been detected if the operands of the commutative operations + and = would not have occurred in diferent orders. To minimize this problem we defne a normal form that we call CommuHash Normal Form (CHNF). This normal form utilizes the fact that in Ultimate every formula has a 32-bit hash code. We say that an SMT formula is in CommuHash Normal Form if for every subformula with a commutative operator the operands are sorted according to their hash code in ascending order. To ensure that every formula is in CHNF Ultimate sorts the parameters whenever we construct a term whose operand is one of the following SMT operators: =, distinct, and, or, xor, +, \*, bvadd, bvmul, bvand, bvor, bvxor.

In order to evalutate the efect of the CommuHash Normal Form we conducted an experiment in which we compared the default version of Ultimate Automizer to a version in which we disabled the sorting of parameters. We ran both versions on the benchmarks of the MemSafety category. In this category we typically have to deal with large formulas because the state assertions of proofs have to encode alias information about the program's pointers. We ran both versions on all 3440 benchmarks of the category. The CPU was an AMD Ryzen Threadripper 3970X, the time limit was 90s, the memory limit was 8000 MB and for each benchmark two CPU cores were used. In each run there were no incorrect results. The run without CHNF produced 1347 correct results, the run with CHNF produced 1439 correct results. Figure 1 shows a comparison of the runtimes for each benchmark in which at least one setting produced a result. We see that on average the run with CHNF needs less time. In fact on average the speedup is 31%.

Fig. 1: Comparison of the runtime with and without CHNF

### 4 Project, Setup and Confguration

Automizer is a part of the open-source program analysis framework Ultimate1 . Both are written in Java and licensed under LGPLv3. We use version 0.2.3 of Automizer [5] for SV-COMP, which requires Java 11 and Python 3.6. The release 0.2.3 contains binaries for Automizer and the SMT solvers Z3, CVC4, and Mathsat, as well as the Python wrapper script Ultimate.py. The Python script provides an interface to the competition environment, in particular to the BenchExec<sup>2</sup> tool-info module ultimateautomizer.py. Automizer also participates as witness validator and can validate violation [2] or correctness witnesses [1]. We participate in all categories <sup>3</sup> as verifer, but our witness validator does not yet support concurrency witnesses. Hence, our validator does not participate in ConcurrencySafety <sup>4</sup> .

Automizer can be run by calling

./Ultimate.py --spec prop.prp --file input.c --architecture 32bit|64bit --full-output [--validate witness.graphml]

where prop.prp is the SV-COMP property fle, input.c is the C fle that should be analyzed, 32bit or 64bit is the architecture of the input fle, and --full-output enables writing of verbose output to stdout. The witness that should be validated is specifed with --validate. If Automizer generates a result, a witness is written to the fle witness.graphml. Automizer's output is always written to the fle Ultimate.log.

### References


<sup>1</sup> https://github.com/ultimate-pa/ultimate

<sup>2</sup> https://github.com/sosy-lab/benchexec

<sup>3</sup> Specifed by uautomizer.xml at https://github.com/sosy-lab/sv-comp.

<sup>4</sup> Specifed by uautomizer-validate-\*-witnesses.xml.


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Ultimate Taipan and Race Detection in Ultimate (Competition Contribution)

Daniel Dietsch? , Matthias Heizmann , Dominik Klumpp() , Frank Sch¨ussele , and Andreas Podelski

> University of Freiburg, Freiburg im Breisgau, Germany klumpp@informatik.uni.freiburg.de

Abstract. Ultimate Taipan integrates trace abstraction with algebraic program analysis on path programs. Taipan supports data race checking in concurrent programs through a reduction to reachability checking. Though the subsequent verification is not tuned for data race checking, the results are encouraging.

### 1 Verification Approach

Ultimate Taipan [6,7] verifies programs using an approach based on trace abstraction [8]. The program is represented as a control flow automaton: Letters correspond to program statements, accepting states correspond to error locations, and accepted words are error traces. The verification consists of proving that all error traces are infeasible (they cannot be executed). To this end, Taipan picks an error trace from the control flow automaton, and computes the corresponding path program, i.e., the projection of the program on the statements in the trace. Taipan then uses symbolic interpretation with fluid abstractions [6], a variant of algebraic program analysis, to prove correctness of this path program. If this fails, the algorithm falls back to an interpolation-based method to prove correctness of the trace itself. In either case, the resulting predicates are used to build a Floyd/Hoare-automaton [8] that accepts a regular language of infeasible traces. This automaton is subtracted from the program's control flow automaton, yielding a refined abstraction. Taipan repeats this procedure in a loop until it finds a feasible error trace (the program is incorrect) or the abstraction is empty (all error traces are infeasible, the program is correct).

For concurrent programs, Taipan performs a na¨ıve sequentialization, and considers the interleaving product of all threads as a (nondeterministic) sequential program. Verification then proceeds on this program as it would for any other sequential program. Note that this also affects the notion of path program, i.e., path programs are also just sequential programs.

Taipan is part of the Ultimate framework, and uses the same front-end as other Ultimate tools. C programs are first translated to the intermediate verification language Boogie [10], the resulting Boogie program is converted into a control flow automaton, which is then verified. The translation from C to

c The Author(s) 2023

<sup>?</sup> Jury Member: Daniel Dietsch

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 582–587, 2023. https://doi.org/10.1007/978-3-031-30820-8 40

Boogie models heap and stack memory through Boogie arrays (associative maps), where pointers correspond to indices. To simplify the subsequent verification, any variables, arrays and structures that are guaranteed to never be accessed through a pointer are instead translated to corresponding Boogie variables.

#### 2 From Data Races to Reachability

Since SV-COMP'22, Taipan can check for data races in concurrent programs. A program written in C contains a data race if there are two different threads, (i) one thread writes to a memory location and the other thread writes to or reads from the same memory location, (ii) at least one of the accesses is not atomic, and (iii) neither access happens-before the other. The C standard [9], section 5.1.2.4, gives the precise definition. Data races constitute undefined behaviour.

Ultimate supports data race checking through a reduction to reachability. This reduction is implemented as part of our translation from C to our custom Boogie dialect. Contrary to C, data races do not constitute undefined behaviour in our Boogie dialect. The semantics prescribes that "simple" Boogie statements – (nondeterministic) assignments and assume statements – execute atomically. We consider all interleavings of these atomic statements, i.e., we assume sequential consistency. Hence the correctness of the generated Boogie programs is well-defined, even if the input C program has undefined behaviour. Any verification algorithm for concurrent programs can be applied to the resulting Boogie program, including the algorithm implemented by Taipan.

The reduction to reachability proceeds as follows. For every global variable x, we introduce a fresh Boolean global variable race x, which tracks read and write accesses to x. By comparing the current value of race x to some value it previously held, we can detect if x has been accessed since. We call an atomic Boogie statement that represents a C statement or an evaluation step for a C expression an action. Let <read(x)> denote an action that reads the value of x, and let <write(x)> denote an action that assigns a new value to x. Our translation wraps such actions in data race detection code as shown in the following listings, where tmp is a boolean, thread-local variable.


For an action a, we call the sequence of Boogie statements that results from this wrapping block(a). Note that a is always contained in block(a). Our translation ensures that if an action a is part of an atomic block (delimited by VERIFIER atomic \*), then the entire block(a) falls inside that atomic block.

For two actions a and b, we say that block(b) can interrupt block(a) if there exists a program execution that executes block(a) up to and including the action a, then fully executes block(b), and then continues to execute the remaining assert statement of block(a). Hence, a block(a) can interrupt block(b) or vice versa if and only if at least one of the actions a or b is not atomic, and neither happens-before [9] the other.

For an action a, the assert statement in block(a) cannot fail, unless there is an action b such that (i) block(b) can interrupt block(a), and (ii) a and b both access the same variable x. For instance, let a be an action that writes to x, and let b be an action that reads from x. In the following example, block(b) can interrupt block(a) and the last assert statement can fail because false can be chosen as value of tmp.

Based on the definition of data races we distinguish three cases for the actions a and b:


From this case distinction we conclude that in the translated Boogie program, an assert statement added for data race detection can fail if and only if the original C program contains a data race.

Our encoding is independent of the synchronization mechanisms used to rule out data races. Whether the program uses VERIFIER atomic \*, pthread mutexes, or directly implements locking mechanisms, no special handling is needed. Our implementation supports not only (primitive) global variables, but also data on the heap (accessed through pointers) as well as off-heap structures and arrays. In such cases, instead of a Boolean variable race x, more complicated data structures are needed. We mirror the data layout with Boolean fields: For every data array, there exists a corresponding Boolean array, for every structure, there is a corresponding structure with Boolean-valued fields, etc.

This handling of complex data types also allows us to deal with aliasing issues: Ultimate models memory as an associative array mem : [Pointer]Int, with pointers as indices. Our race detection encoding creates a corresponding booleanvalued associative array race mem : [Pointer]Boolean. The instrumentation for an access to a memory location through a pointer p then manipulates the entry race mem[p]. If pointers p and q point to the same memory location ` at runtime, then race mem[p] and race mem[q] refer to the same array entry. Hence, if there is a data race on `, one of the generated assert statements can fail.

### 3 Strengths and Weaknesses

Our encoding of data races is independent of the subsequent verification algorithm. We have employed this encoding since SV-COMP 2022 [2], for Taipan as well as in the Ultimate tools Automizer and GemCutter (Ultimate Kojak currently does not support concurrency).

We inherit limitations of the respective verification algorithms. Taipan is unable to prove correctness of programs with an unbounded (or very high) number of threads. The NoDataRace category contains many such programs. Overall, the Ultimate tools perform competitively in the NoDataRace-Main category, with Automizer, GemCutter and Taipan reaching 4th, 5th and 6th place, respectively. In comparison with last year's performance in the demo category (4th, 1st and 2nd place), a major factor seems to be the large number of new correct benchmarks, where we do not perform as well yet. Perhaps some tuning of the subsequent verification algorithms to the detection of data races can lead to improvements in the future.

The presented encoding of data races as reachability is compositional, and independent of the number of threads that are running concurrently: We always add a single assertion per access, in contrast to some other methods [4].

One limitation of our implementation is that, from a feasible trace that ends in an assertion violation, it is not always immediately clear which accesses have a data race. In order to support violation witnesses for data races in future editions of SV-COMP, a more detailed analysis of the trace will be needed.

Our performance suffers in some cases due to a large amount of instrumentation, e.g. in benchmarks where large structs are copied: Currently, we handle each byte in the struct separately. In the future, we hope to improve the implementation to (i) handle reads and writes of large memory chunks more efficiently, (ii) detect more situations in which a concurrent access can be easily ruled out, and no instrumentation is needed, and (iii) making parts of the generated data race detection code atomic, thus reducing the number of interleavings.

#### 4 Architecture, Setup, Configuration, and Project

Ultimate Taipan is part of Ultimate<sup>1</sup> , a program analysis framework written in Java and licensed under LGPLv3<sup>2</sup> . Taipan version 0.2.2-2329fc70 requires Java 11 and Python 3.6. The submitted .zip archive contains the Linux version of Taipan, binaries of the required SMT solvers<sup>3</sup> , and a Python wrapper script. Taipan is invoked with

./Ultimate.py --spec <p> --file <f> --architecture <a> --full-output where <p> is an SV-COMP property file, <f> is an input C file, <a> is the data model (32bit or 64bit), and --full-output enables verbose output to stdout. A violation or correctness witness may be written to the file witness.graphml. The benchmarking tool BenchExec [3] supports Taipan through the tool-info module ultimatetaipan.py<sup>4</sup> . Taipan participates in all categories, as declared in its SV-COMP benchmark definition file utaipan.xml<sup>5</sup> .

<sup>1</sup> ultimate.informatik.uni-freiburg.de and github.com/ultimate-pa/ultimate

<sup>2</sup> www.gnu.org/licenses/lgpl-3.0.en.html

<sup>3</sup> Z3 (github.com/Z3Prover/z3), CVC4 (cvc4.github.io/) and Mathsat (mathsat.fbk.eu)

<sup>4</sup> github.com/sosy-lab/benchexec/blob/main/benchexec/tools/ultimatetaipan.py

<sup>5</sup> gitlab.com/sosy-lab/sv-comp/bench-defs/-/blob/main/benchmark-defs/utaipan.xml

Data Availability Ultimate Taipan can be found in the archive of all verifiers and validators participating in SV-COMP'23 [1]. Additionally, the .zip archive containing only Taipan is available online<sup>6</sup> and on Zenodo [5].

### References


<sup>6</sup> gitlab.com/sosy-lab/sv-comp/archives-2023/-/blob/main/2023/utaipan.zip

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## VeriAbsL: Scalable Verification by Abstraction and Strategy Prediction (Competition Contribution)

Priyanka Darke1,?() , Bharti Chimdyalwar<sup>1</sup> , Sakshi Agrawal<sup>1</sup> , Shrawan Kumar<sup>1</sup> , R Venkatesh<sup>1</sup> , and Supratik Chakraborty<sup>2</sup>

> <sup>1</sup> TCS Research, Pune, India priyanka.darke@tcs.com Indian Institute of Technology, Bombay, India

Abstract. We present VeriAbsL, a reachability verifier that performs verification in three stages. First, it slices the input code using a combination of two slicers, then it verifies the slices using predicted strategies, and at last, it composes the result of verifying the individual slices. We introduce a novel shallow slicing technique that uses variable reference information of the program, and data and control dependencies of the entry function to generate slices. We also introduce a novel strategy prediction technique that uses machine learning to predict a strategy. It uses boolean features to describe a program to a neural network that predicts a strategy. We use the portfolio of VeriAbs, a reachabiltiy verifier with manually defined strategies. In sv-comp 2023, VeriAbsL verified 227<sup>3</sup> more programs than VeriAbs, and 475<sup>3</sup> programs that VeriAbs could not verify.

### 1 Verification Approach

2

It is folklore in automated software verification that no single verification technique is good enough to verify all programs of interest. This limitation led to the advent of strategy selection-based verifiers that use predefined verification strategies [4]. A strategy is a sequence of verification techniques applied to a program, where each technique is bounded by a heuristically defined time limit. In this paper, we present a strategy prediction-based reachability verifier for C programs called VeriAbsL. It verifies a program in stages using a portfolio of two slicing, and ten verification techniques. First, it slices a program using a sequence of slicers. Then it uses a few syntactic and semantic features of the slice to predict a strategy and verify the slice. Lastly, it composes the result of verifying each slice. VeriAbsL uses a sequential combination of two slicers, a slicer-analyzer [7], and a novel shallow slicer or Sslicer. Sslicer is applied to programs that could not be sliced by the slicer-analyzer. The slicer-analyzer is more efficient than Sslicer, but applies to a smaller class of programs as explained in Section 1.2. Let a program P be sliced into n slices. A strategy prediction module extracts the features of each slice P<sup>i</sup> , 1≤i≤n, and predicts a strategy for it using a neural network. The program P is safe if each slice P<sup>i</sup> is safe, and P is unsafe if any slice P<sup>i</sup> is unsafe. If program P cannot be sliced, then a

c The Author(s) 2023

<sup>?</sup> P. Darke—Jury member

<sup>3</sup> Without witness validation.

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 588–593, 2023. https://doi.org/10.1007/978-3-031-30820-8 41

strategy is predicted for P itself. Fig. 1 shows the architecture of VeriAbsL. As shown VeriAbsL uses the portfolio of a strategy selection-based verifier called VeriAbs [7].

Fig. 1. VeriAbsL Architecture (S: Program Safe, F: Property Fails, U: Unknown)

#### 1.1 Strategy Prediction using Machine Learning (ML)

Despite the advantages of sequencing multiple verification techniques in a strategy, experimental evidence indicates that each strategy works well for only a class of programs. When a new class is encountered, experts define a new strategy and update the strategy-selection algorithm of the verifier. This is a tedious task. In order to automate it, recently ML-based verifiers have been used with partial success [5]. VeriAbsL is one such verifier. It uses a simple ML-based approach explained as follows.

Feature Vector Generation. VeriAbsL uses a feature vector f of 22 boolean features that describe a few semantic, or syntactic constructs of the input slice Pj. For example, a boolean feature fi∈f if set to true can indicate the presence of arrays in the input code, and false can indicate that no arrays are used. These features are computed using a light-weight static analysis, and derived from those presented in [8].

Neural Network. VeriAbsL uses a three layered neural network with multi-class classification, one class for each of the ten techniques in our portfolio. It has 22, 17, and 10 neurons in the respective layers. It was trained using ReLU for the hidden layer and softmax for the output layer, as activation functions, and with the mean-squared error loss function. It translates an input feature vector f representing program slice P<sup>j</sup> into likelihoods of success l<sup>i</sup> , 1≤i≤10, of the corresponding verification techniques Ti in the portfolio for slice Pj. Each output node of the neural network n<sup>i</sup> represents one verification technique T<sup>i</sup> and the value l<sup>i</sup> generated by the network at that node ni is a heuristic measure of the relative likelihood that technique T<sup>i</sup> will successfully verify/disprove the property for slice P<sup>j</sup> within 900 seconds.

Strategy Prediction. A strategy (T<sup>k</sup><sup>1</sup> ,...,T<sup>k</sup>10), 1≤k<sup>r</sup> ≤10, 1≤r≤10, is created by sorting the relative likelihoods of success l<sup>i</sup> of each verification technique T<sup>i</sup> in the decreasing order. The techniques T<sup>i</sup> are invoked in that order to verify slice Pj.

Experimental Results. The neural network in VeriAbsL was trained on 800 randomly selected sv-comp 2022 ReachSafety benchmarks. At sv-comp 2023 out of all 6138 benchmarks, VeriAbsL verified 227 more programs in 4.4% lesser time than

VeriAbs<sup>4</sup> and verified 475 programs that VeriAbs could not verify<sup>3</sup> . This was because VeriAbsL predicted useful techniques early in its strategies, while VeriAbs selected unsuitable strategies and ran out of time. Further the randomly selected training data did not contain any benchmarks from three ReachSafety sub-categories namely Combinations, ProductLines, and Hardware. VeriAbsL verified 72 more programs than VeriAbs in these 3 sub-categories demonstrating that strategy-prediction in VeriAbsL generalizes to programs for which it was not trained. VeriAbsL ran out of time for 248 programs verified by VeriAbs because the randomly selected training data did not contain any sample corresponding to two techniques, namely Vajra [6] and Counter-Example Guided Loop Abstraction Refinement (ceglar) [4], needed to verify the 248 programs. Thus they were always predicted late. Further VeriAbsL verified 1047 and 543 more benchmarks compared to the other ML-based strategy prediction tools, Graves [11] and PeSCo [12], respectively.

Strengths and Weaknesses of Strategy Prediction. VeriAbsL can verify more programs than VeriAbs in spite of the same portfolio because it uses ML for strategy prediction. Also VeriAbsL demonstrates that a small set of boolean features can be used successfully to verify programs, while other successful verifiers predict a strategy using graph based learning methods [12]. Further VeriAbsL does not incorporate a feedback mechanism that can penalize a technique if it cannot verify a program. Such a feedback mechanism can improve its efficiency and accuracy.

#### 1.2 Shallow Slicer

Sslicer is a generalization of the slicer-analyzer presented in [7] and like the latter, aims for a scalable slicing with respect to calls in entry function main. But unlike the slicer-analyzer, Sslicer allows multiple calls in main to (1) refer to the same global variable, (2) transitively invoke the same function, or (3) have transitive dependence on the same data element or control structure in main.

Sslicer partitions the program functions directly or indirectly called from main into n sets F1...F<sup>n</sup> such that the following conditions, termed as partition-independence, are satisfied: (1) Each partition F<sup>i</sup> contains at least one function directly called from main. (2) Each partition F<sup>i</sup> contains functions which are either directly or transitively called from main. (3) All functions transitively called from function f ∈F<sup>i</sup> also belong to F<sup>i</sup> , the same partition as f. Thus if T(f) is the set of functions transitively called from f, then ∀i, 1≤i≤n, ∀f ∈F<sup>i</sup> , T(f)⊆F<sup>i</sup> . (4) No two functions f ∈F<sup>i</sup> and g∈F<sup>j</sup> belonging to different partitions transitively call the same function or refer to the same global variable. Let V (Fi) be the set of global variables referred to by functions in set F<sup>i</sup> then ∀i,j | 1≤i≤n, 1≤j≤n, i=6 j =⇒ (V (Fi)∩V (Fj)=∅) (5) Let main<sup>i</sup> be the function generated when a program containing only one function, the function main, is sliced (using known slicing techniques [9]) with respect to calls to functions in set F<sup>i</sup> which are directly called from main. Then functions of no other set Fj, i=6 j, should refer to the variables used in main<sup>i</sup> . Thus ∀i,j | 1≤i≤n, 1≤j≤n, i=6 j =⇒ (V (maini)∩ V (Fj)=∅) (6) n is the largest possible natural number satisfying the above conditions.

<sup>4</sup> The competition score of VeriAbs is greater than VeriAbsL because of 8 incorrect results produced due to bugs in the implementation of a technique predicted by VeriAbsL. This technique was not executed for these 8 programs by VeriAbs.

A slice P<sup>i</sup> corresponding to each set F<sup>i</sup> is generated. The set of functions in slice P<sup>i</sup> is given by maini∪F<sup>i</sup> . To create the slice, call graph and referred variables information is computed using call-trees, and a light-weight flow-insensitive pointer analysis. We assume that function main itself is not a part of any recursive call chain, and does not specify the assertions directly.


#### Fig. 2. Example

Example. Consider the program presented in Fig. 2a. In this example functions called from main can be initially partitioned into three sets {f1, f3}, {f2, f4,} and {f5} as f1 calls f3, f2 calls f4, and f5 does not refer to any function or variable that other functions refer to. But function f4 refers to variable a. If a program containing only the body of function main shown in Fig. 2a were to be sliced with respect to the call to f1 in main then it would refer to variable a. Function f1 belongs to the first partition and f4 to the second. To satisfy the fifth condition of partition-independence functions f1 and f4 must belong to a single partition. Thus finally there are two partitions - {f1, f2, f3, f4}, and {f5}. The slices created for the first and second partitions are shown in Figures 2b and 2c respectively. Notice that since function f5 does not refer to variable b in its body, it need not be merged with the other partition even though the body of sliced main in Fig. 2c refers to variable b.

Experimental Results. We compare the performances of VeriAbsL with (1) sliceranalyzer, and (2) slicer-analyzer and Sslicer, on all 6138 benchmarks of the Reach-Safety category of sv-comp 2023. The first configuration generated slices for 671 programs while the second generated slices for 1369 programs showing better applicability. Further, due to Sslicer, VeriAbsL terminated its analysis for 42 more programs, showing improved scalability, and its portfolio could verify 4 additional programs.

#### 1.3 Software Project, Architecture, and Setup

The Foundations of Computing research group at TCS Research [1] has developed VeriAbsL. It is written in Perl, Java and Python. It uses TCS's program analysis framework [10] for static analysis, and TensorFlow libraries [2] for learning. VeriAbsL uses VeriAbs's portfolio [7], except Vajra [6] because it is not supported on Ubuntu 22.04 LTS. VeriAbsL participated in the Reach-Safety category at sv-comp 2023, and is available at [3]. The installation instructions are in VeriAbsL/INSTALL.txt, the BenchExec<sup>5</sup> wrapper script for the tool is veriabsl.py, and the benchmark definition file is veriabsl.xml. On successful verification, VeriAbsL generates a witness in the current working directory as witness.graphml. A sample command to verify property given in file reach-safety.prp for a program, given in a.c, of a 32-bit (or 64-bit) architecture is as follows: VeriAbsL/scripts/veriabs -32|64 --property-file reach-safety.prp a.c

<sup>5</sup> https://github.com/sosy-lab/benchexec

### 2 Data-Availability Statement

VeriAbsL is available as part of sv-comp 2023 verifier repository at https://gitlab. com/sosy-lab/sv-comp/archives-2023/-/blob/main/2023/veriabsl.zip. For any queries please contact the authors at veriabs.tool@tcs.com.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## VeriFuzz 1.4: Checking for (Non-)termination (Competition Contribution)

Ravindra Metta , Prasanth Yeduru , Hrishikesh Karmarkar , and Raveendra Kumar Medicherla?()

TCS Research, Tata Consultancy Services, Pune, India {r.metta,prasanth.yeduru,hrishikesh.karmarkar,raveendra.kumar}@tcs.com

Abstract. In VeriFuzz 1.4, we implemented two new techniques for checking Non-termination and Termination. VeriFuzz 1.4 won the Termination category of SV-COMP 2023.

### 1 Approach for Non-termination and Termination

VeriFuzz 1.2.0 [4,10,11] is a framework to automatically generate test cases, and lacks the ability to prove properties such as termination. Given a program P and termination as the property, a tool needs to either provide a witness for Nontermination of P, or give a true verdict if P always terminates. Therefore, we developed two techniques: one for proving Non-termination and one for checking termination with a high confidence, which are described below.

#### 1.1 Technique for Non-termination Checking

For SV-COMP 2023, we implemented a variant of FuzzNT [7], a sound technique for proving Non-termination arising due to infinite loops. FuzzNT takes as input a C program P and a corpus of test inputs T generated using the Coverage Guided Fuzzer of VeriFuzz 1.2. Each test input t ∈ T is a sequence of values to be supplied to P via nondet() calls. We illustrate the key steps of FuzzNT using the program P (Listing 1.1), adopted from the code that caused the SSL nontermination [13]. Note that P terminates on the test input t = h1 : j = 129, 4 : i == 1, 5 : j = 5, 4 : i == 3i. Given such a test input, FuzzNT transforms P into a Path Specific Program (PSP) P 0 (Listing 1.2), by replacing each nondet() call in P with the corresponding value in the test input, if any, as described in [7]. If multiple values in the test input correspond to a nondet() call in P, FuzzNT picks the first value among them to replace the nondet() call. For example, in t, both i == 1 and i == 3 correspond to the nondet() call on Line 4 in Listing 1.1. So, as shown on Line 4 of Listing 1.2, this nondet() call is replaced with i == 1. Notice that P <sup>0</sup> has only one feasible execution path, which does not terminate. P 0 is then supplied to an abstract interpretation based safety checker, which checks if P <sup>0</sup> does not terminate. If the check succeeds, then P 0 is non-terminating and

<sup>?</sup> Jury member

c The Author(s) 2023

S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. https://doi.org/10.1007/978-3-031-30820-8 42 594–599, 2023.


hence P is also non-terminating, and a proof of Non-termination is generated for P in the form of a witness automaton. These steps are repeated until either a non-terminating execution is discovered, or test inputs are exhausted.

#### 1.2 Variant of FuzzNT implemented in VeriFuzz 1.4

The version of FuzzNT in [7] uses Frama-C [14] for the abstract interpretation based Non-termination check. However, we noticed that Frama-C's abstract interpretation does not precisely model termination semantics of standard library functions like abort(). This leads to Frama-C incorrectly identifying some terminating programs as non-terminating. Further, we could not bundle Frama-C with FuzzNT due to the installation dependencies and it is unavailable in the Competition Environment of SV-COMP 2023. Therefore, we implemented a variant of FuzzNT using the C Bounded Model Checker [5], as described below.

Given a program P, we begin by checking if P terminates as described in Section 1.3. If this check could not identify the termination of P, then we generate PSPs for P using VeriFuzz 1.2 (described in Section 1.1). Next CBMC is run on each generated PSP, say P 0 , with a small loop unwind bound, say k, and check for CBMC's built-in unwinding assertion, which checks if all loops within P 0 iterate at most k times. If this check succeeds, then P 0 is a terminating program. If this check fails, then there exists an input for which some loop in P 0 iterates more than k times. We then iteratively increase k and repeat the termination check until a large enough k such as 10, 000. In our experiments, we observed that while CBMC does not scale to such a large unwinding of P, it does scale to large unwindings of the PSPs of P, as they admit much fewer behaviours than P. If the check fails even at 10, 000 for P 0 , it is likely to be non-terminating. We then generate a witness automaton for P using P 0 , classifying P as non-terminating.

#### 1.3 Technique for termination

To check if a given a program P terminates on all inputs, we designed an unsound, but high confidence, incremental verification technique based on Bounded Model Checking (CBMC). This technique works in two phases. Phase-1 is the same as CBMC's own termination check. In this, we begin by unwinding all the loops in P for a small number of iterations, such as 2. Then, using CBMC's built-in loop unwinding assertion check, we verify if all loops terminate within this small unwinding, say k. If this check is successful, then all loops in P terminate within k iterations and hence P itself terminates, and we return TRUE to declare P to be terminating. If the check fails for any loop, then that loop can iterate more than k times. So, we increment k, and repeat the check. This approach suffers from two limitations. (1) As k grows larger, BMC suffers from scalability issues, and (2) if P has a feasible non-terminating path, then the check for a higher k repeats forever. To overcome these limitations, we stop Phase-1 and return UNKNOWN as soon as k reaches a threshold value (pre-configured for SV-COMP 2023). We then proceed to Phase-2, described below.

In Phase-2, we try to find a small model for the termination property of P, by guessing a small range R of the inputs (viz. nondet() calls), such that if P terminates for all inputs in R, then P is highly likely to terminate for all its inputs. To guess this R, we learnt a Decision Tree (DT) model on a training data of less than 10% of SV-COMP benchmarks, based on program features and sample execution traces. We are working on formalizing this approach via ranking functions [6].

We then run the incremental verification from Phase-1, but by bounding the nondet() values to those in R. This bounding allows CBMC's backend solvers such as Z3 to scale to a larger loop unwind K (∼ 100, 000 in our experiments). If all loops in P terminate within at most K iterations given the R-bounding, then we assume that P is highly likely to terminate on all inputs even without R-bounding. Therefore, if this bounded value check concludes that P terminates, then we return TRUE to declare P to be terminating, else we return UNKNOWN and invoke the non-termination check described in Section 1.2.

### 2 Software Architecture

Fig. 1. VeriFuzz 1.4 architecture

Figure 1 shows the architecture of VeriFuzz 1.4. Here P is the input program, and φ is the termination property. The process-blocks Phase-1 Term-Check and Phase-2 Bounded-Value Term-Check, together constitute our two phased termination check described in Section 1.3. If both Phase-1 and Phase-2 return UNKNOWN, we then execute the Non-termination check described in Section 1.2. That is, we first generate PSPs using VeriFuzz 1.2, and search for a likely non-terminating PSP, say P 0 . If we find such a P 0 , we generate a witness automaton and return FALSE (to report non-termination). Else, all the above steps must have returned UNKNOWN, and VeriFuzz 1.4 is unable to decide if P is terminating or non-terminating, and hence returns UNKNOWN.

In Figure 1, VeriFuzz 1.2 is built using PRISM [8] program analysis framework, AFL [16], and CBMC v5.67.0 [1] with Z3 4.8.15 [12] and Glucose Syrup [2] as the backend SMT and SAT solvers respectively. The DT model used in Phase-2 of the termination check (see Section 1.3) is trained offline using booster trees [3]. The rest of VeriFuzz 1.4 is implemented in C++ and Python.

#### 3 Strengths and Weaknesses

Out of 1043 Termination tasks in SV-COMP 2023, our two phase technique correctly solved 865. Some of these, such as termination-crafted/easy2-2.c and termination-dietlibc/atoi.c, contain loops that iterate arbitrarily large number of times. Hence, while BMC fails to conclude their termination, our approach succeeds as it limits the number of loop iterations by restricting the inputs to a small range. Tasks, such as termination-restricted-15/Sunset.c, terminate within the value ranges guessed during Phase-2, but do not terminate for some inputs that lie outside the ranges. Thus, we wrongly reported them to be terminating.

Out of 766 Non-termination tasks, our Non-termination technique correctly solved 351. Of these, tasks such as systemc/pipeline.cil-1.c , have complex control and data dependencies, which could not be solved by approaches such as those in UAutomizer [9] and Symbiotic [15]. But, the PSPs of these programs, generated by our technique, were much simpler to check for non-termination and hence our technique succeeded on them. However, within the given time limits, if all the PSPs we generated happen to be terminating, then our technique fails to identify the non-termination. Our results on tasks locks/test locks 14-2.c and termination-restricted-15/Ex02.c demonstrate this behaviour. Another weakness is that our technique currently does not handle programs with recursion.We are currently developing new techniques that address these weaknesses.

#### 4 Tool Configuration and Setup

VeriFuzz 1.4 is available at git@gitlab.com:sosy-lab/sv-comp/archives-2023.git. To install and run the tool, follow the instructions in the README.txt. The benchexec tool-info module is verifuzz.py and the benchmark definition file is verifuzz.xml. A sample run command is as follows: ./scripts/verifuzz.py --propertyFile termination.prp example.c. In SV-COMP 2023, VeriFuzz opts to participate in Termination, ReachSafety, and Overflow categories.

#### 5 Software Project and Contributors

VeriFuzz is developed and maintained by the authors at TCS Research. We thank everyone who has contributed to the development of VeriFuzz and the tools AFL, PRISM, CBMC, Glucose Syrup, and Z3. Contact: verifuzz.tool@tcs.com.

### 6 Data-Availability Statement

VeriFuzz 1.4 is available as part of SV-COMP 2023 verifier repository at https:// gitlab.com/sosy-lab/sv-comp/archives-2023/-/blob/main/2023/verifuzz.zip. For any queries, please contact the authors at verifuzz.tool@tcs.com.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Author Index**

#### **A**

Abdulla, Parosh Aziz I-588 Abdulla, Parosh I-105 Aggarwal, Saksham I-666 Agrawal, Sakshi II-588 Albert, Elvira I-448 Aljaafari, Fatimah II-541 Amir, Guy I-607 Anand, Ashwani II-211 Andreotti, Bruno I-367 Apinis, Kalmer II-453 Atig, Mohamad Faouzi I-588 Atig, Mohamed Faouzi I-105 Avigad, Jeremy II-74 Ayaziová, Paulína II-523

#### **B**

Bach, Jakob I-407 Bajwa, Ali I-308 Balachander, Mrudula II-309 Banerjee, Anindya II-133 Barbosa, Haniel I-367 Barrau, Florian II-3 Barth, Max II-577 Bassan, Shahaf I-187 Batz, Kevin II-410 Bentkamp, Alexander II-74 Beutner, Raven I-145 Beyer, Dirk II-152, II-495 Biere, Armin I-426 Blanchette, Jasmin II-111 Bonakdarpour, Borzoo I-29, I-66 Bouma, Jelle II-19 Bruyère, Véronique I-271

#### **C**

Cadilhac, Michaël II-192 Chadha, Rohit I-308

Chakraborty, Supratik II-588 Chalupa, Marek II-535 Chatterjee, Krishnendu I-3 Chen, Mingshuai II-410 Chien, Po-Chun II-152 Chimdyalwar, Bharti II-588 Chin, Wei-Ngan I-569 Cimatti, Alessandro II-3 Cooper, Martin C. I-167 Cordeiro, Lucas C. II-541 Corfini, Sara II-3 Correas, Jesús I-448 Corsi, Davide I-607 Cortes, João II-55 Cristoforetti, Luca II-3

#### **D**

Darke, Priyanka II-588 de Gouw, Stijn II-19 de la Banda, Alejandro Stuckey I-666 de Pol, Jaco van II-353 Deligiannis, Pantazis II-433 Denis, Xavier II-93 Di Natale, Marco II-3 Dietsch, Daniel II-577, II-582 Dimitrova, Rayna II-251 Doveri, Kyveli I-290 Duan, Zhenhua II-571

#### **E**

Erhard, Julian II-547 Ernst, Gidon II-559 Etman, L. F. P. II-44 Eugster, Patrick I-126

#### **F**

Fang, Wenji II-11 Farinelli, Alessandro I-607

© The Editor(s) (if applicable) and The Author(s) 2023 S. Sankaranarayanan and N. Sharygina (Eds.): TACAS 2023, LNCS 13994, pp. 601–604, 2023. https://doi.org/10.1007/978-3-031-30820-8

602 Author Index

Fedyukovich, Grigory II-270 Fichtner, Leonard II-577 Filiot, Emmanuel II-309 Finkbeiner, Bernd I-29, I-145 Fokkink, W. J. II-44 Fuchs, Tobias I-407 Furbach, Florian I-588

#### **G**

Ganty, Pierre I-290 Godbole, Adwait A. I-588 Goorden, M. A. II-44 Gordillo, Pablo I-448 Griggio, Alberto II-3 Guo, Xingwu I-208 Gupta, Ashutosh I-105 Gutierrez, Julian I-666

#### **H**

Hadži-Ðoki´c, Luka I-290 Hahn, Ernst Moritz I-527 Hamza, Ameer II-270 Harel, David I-607 Hartmanns, Arnd I-469 Havlena, Vojtˇech I-249 Heim, Philippe II-251 Heisinger, Maximilian I-426 Heizmann, Matthias II-577, II-582 Hendi, Yacoub G. I-588 Hendriks, D. II-44 Henzinger, Thomas A. I-3, II-535 Herasimau, Andrei II-473 Heule, Marijn J. H. I-329, I-348, I-389 Hoenicke, Jochen II-577 Hofkamp, A. T. II-44 Hsu, Tzu-Han I-29, I-66 Huang, Xuanxiang I-167 Hussein, Soha II-553

#### **I**

Iser, Markus I-407

#### **J**

Jaber, Nouraldin II-289 Jacobs, Swen II-289 Jakobsen, Anna Blume II-353 Jansen, Nils I-508 Jongmans, Sung-Shik II-19

Jourdan, Jacques-Henri II-93 Junges, Sebastian I-469, I-508, II-410

#### **K**

Kaminski, Benjamin Lucien II-410 Karmarkar, Hrishikesh II-594 Katoen, Joost-Pieter II-391, II-410 Katz, Guy I-187, I-208, I-607 Kiesl-Reiter, Benjamin I-329, I-348 Klumpp, Dominik II-577, II-582 Kobayashi, Naoki I-227 Kokologiannakis, Michalis I-85 Konnov, Igor I-126 Korovin, Konstantin I-647 Kovács, Laura I-647 Krishna, S. I-105 Krishna, Shankara N. I-588 Kukovec, Jure I-126 Kulkarni, Milind II-289 Kullmann, Oliver II-372 Kumar, Shrawan II-588

#### **L**

Lachnitt, Hanna I-367 Lal, Akash II-433 Larsen, Casper Abild II-353 Lechner, Mathias I-3 Lee, Nian-Ze II-152 Lefaucheux, Engel I-47 Lengál, Ondˇrej I-249 Lester, Martin Mariusz II-173 Li, Jianwen II-36 Li, Yong I-249 Lima, Leonardo II-473 Lovett, Chris II-433 Lynce, Inês II-55

#### **M**

Malík, Viktor II-529 Mallik, Kaushik II-211 Manino, Edoardo II-541 Manquinho, Vasco II-55 Marmanis, Iason I-85 Marques-Silva, Joao I-167 Marzari, Luca I-607 Matheja, Christoph II-410 McCamant, Stephen II-553 Medicherla, Raveendra Kumar II-594 Meggendorfer, Tobias I-489

Author Index 603

Melham, Tom I-549 Menezes, Rafael II-541 Metta, Ravindra II-594 Meyer, Roland I-628 Michaelson, Dawn I-348 Miné, Antoine II-565 Mir, Ramon Fernández II-74 Monat, Raphaël II-565 Moormann, L. II-44 Morgado, Antonio I-167

#### **N**

Nagasamudram, Ramana II-133 Naouar, Mehdi II-577 Naumann, David A. II-133 Nayak, Satya Prakash II-211 Nayyar, Fahad II-433 Neˇcas, František II-529

#### **O**

Osama, Muhammad I-684 Otoni, Rodrigo I-126 Ouadjaout, Abdelraouf II-565 Ouaknine, Joël I-47

#### **P**

Pai, Rekha I-549 Park, Seung Hoon I-549 Pavlogiannis, Andreas II-353 Pérez, Guillermo A. I-271 , II-192 Perez, Mateo I-527 Pietsch, Manuel II-547 Planes, Jordi I-167 Podelski, Andreas II-577 , II-582 Pu, Geguang II-36 Purser, David I-47

#### **Q**

Quatmann, Tim I-469

#### **R**

Raskin, Jean-François II-309 Raszyk, Martin II-473 Reeves, Joseph E. I-329 Reger, Giles I-647 Reijnen, F. F. H. II-44

Reniers, M. A. II-44 Román-Díez, Guillermo I-448 Rooda, J. E. II-44 Rubio, Albert I-448

#### **S**

Saan, Simmo II-547 Samanta, Roopsha II-289 Sánchez, César I-29 , I-66 Sankur, Ocan II-28 , II-329 Schewe, Sven I-527 Schiffelers, R. R. H. II-44 Schindler, Tanja II-577 Schmidt, Simon Meldahl II-353 Schmuck, Anne-Kathrin II-211 Schoisswohl, Johannes I-647 Schrammel, Peter II-529 Schreiber, Dominik I-348 Schulz, Stephan II-111 Schüssele, Frank II-577 , II-582 Schwarz, Michael II-547 Seidl, Helmut II-547 Seidl, Martina I-426 Senthilnathan, Aditya II-433 Sharifi, Mohammadamin I-47 Sharma, Vaibhav II-553 Sharygina, Natasha I-126 Sheinvald, Sarai I-66 Shmarov, Fedor II-541 Shukla, Ankit II-372 Šmahlíková, Barbora I-249 Somenzi, Fabio I-527 Song, Yahui I-569 Spengler, Stephan I-588 Staquet, Gaëtan I-271 Steensgaard, Jesper II-353 Strejˇcek, Jan II-523 Su, Jie II-571 Subercaseaux, Bernardo I-389

#### **T**

Thomas, Bastien II-28 Thuijsman, S. B. II-44 Tian, Cong II-571 Tilscher, Sarah II-547 Tonetta, Stefano II-3

Traytel, Dmitriy II-473 Trivedi, Ashutosh I-527 Tuppe, Omkar I-105 Turrini, Andrea I-249

#### **V**

Vafeiadis, Viktor I-85 van Beek, D. A. II-44 van de Mortel-Fronczak, J. M. II-44 van der Sanden, L. J. II-44 van der Vegt, Marck I-508 Venkatesh, R II-588 Verbakel, J. J. II-44 Viswanathan, Mahesh I-308 Vogel, J. A. II-44 Vojdani, Vesal II-453, II-547 Vojnar, Tomáš II-529 Voronkov, Andrei I-647 Vukmirovi´c, Petar II-111

#### **W**

Wagner, Christopher II-289 Wang, Yuning II-229 Weininger, Maximilian I-469 Whalen, Michael W. I-348, II-553 Wies, Thomas I-628 Wijs, Anton I-684

Winkler, Tobias II-391 Wojtczak, Dominik I-527 Wolff, Sebastian I-628 Wu, Minchao I-227

#### **X**

Xiao, Shengping II-36 Xing, Hengrui II-571

#### **Y**

Yan, Qiuchen II-553 Yang, Jiyu II-571 Yang, Luke I-666 Yang, Zuchao II-571 Yeduru, Prasanth II-594 Yerushalmi, Raz I-607 Yuan, Simon II-473

#### **Z**

Zhang, Chengyu II-36 Zhang, Hongce II-11 Zhang, Min I-208 Zhang, Minjian I-308 Zhang, Yueling I-208 Zhou, Ziwei I-208 Zhu, He II-229 Žikeli´c, Ðor de I-3